123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264 |
- [[breaking_50_mapping_changes]]
- === Mapping changes
- ==== `string` fields replaced by `text`/`keyword` fields
- The `string` field datatype has been replaced by the `text` field for full
- text analyzed content, and the `keyword` field for not-analyzed exact string
- values. For backwards compatibility purposes, during the 5.x series:
- * `string` fields on pre-5.0 indices will function as before.
- * New `string` fields can be added to pre-5.0 indices as before.
- * `text` and `keyword` fields can also be added to pre-5.0 indices.
- * When adding a `string` field to a new index, the field mapping will be
- rewritten as a `text` or `keyword` field if possible, otherwise
- an exception will be thrown. Certain configurations that were possible
- with `string` fields are no longer possible with `text`/`keyword` fields
- such as enabling `term_vectors` on a not-analyzed `keyword` field.
- ==== Default string mappings
- String mappings now have the following default mappings:
- [source,js]
- ---------------
- {
- "type": "text",
- "fields": {
- "keyword": {
- "type": "keyword",
- "ignore_above": 256
- }
- }
- }
- ---------------
- This allows to perform full-text search on the original field name and to sort
- and run aggregations on the sub keyword field.
- ==== Numeric fields
- Numeric fields are now indexed with a completely different data-structure, called
- BKD tree, that is expected to require less disk space and be faster for range
- queries than the previous way that numerics were indexed.
- Term queries will return constant scores now, while they used to return higher
- scores for rare terms due to the contribution of the document frequency, which
- this new BKD structure does not record. If scoring is needed, then it is advised
- to map the numeric fields as <<keyword,`keyword`s>> too.
- Note that this <<keyword,`keyword`>> mapping do not need to replace the numeric
- mapping. For instance if you need both sorting and scoring on your numeric field,
- you could map it both as a number and a `keyword` using <<multi-fields>>:
- [source,js]
- --------------------------------------------------
- PUT my_index
- {
- "mappings": {
- "my_type": {
- "properties": {
- "my_number": {
- "type": "long",
- "fields": {
- "keyword": {
- "type": "keyword"
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- Also the `precision_step` parameter is now irrelevant and will be rejected on
- indices that are created on or after 5.0.
- ==== `_timestamp` and `_ttl`
- The `_timestamp` and `_ttl` fields were deprecated and are now removed. As a
- replacement for `_timestamp`, you should populate a regular date field with the
- current timestamp on application side. For `_ttl`, you should either use
- time-based indices when applicable, or cron a delete-by-query with a range
- query on a timestamp field
- ==== `index` property
- On all field datatypes (except for the deprecated `string` field), the `index`
- property now only accepts `true`/`false` instead of `not_analyzed`/`no`. The
- `string` field still accepts `analyzed`/`not_analyzed`/`no`.
- ==== Doc values on unindexed fields
- Previously, setting a field to `index:no` would also disable doc-values. Now,
- doc-values are always enabled on numeric and boolean fields unless
- `doc_values` is set to `false`.
- ==== Floating points use `float` instead of `double`
- When dynamically mapping a field containing a floating point number, the field
- now defaults to using `float` instead of `double`. The reasoning is that
- floats should be more than enough for most cases but would decrease storage
- requirements significantly.
- ==== `norms`
- `norms` now take a boolean instead of an object. This boolean is the replacement
- for `norms.enabled`. There is no replacement for `norms.loading` since eager
- loading of norms is not useful anymore now that norms are disk-based.
- ==== `fielddata.format`
- Setting `fielddata.format: doc_values` in the mappings used to implicitly
- enable doc-values on a field. This no longer works: the only way to enable or
- disable doc-values is by using the `doc_values` property of mappings.
- ==== `fielddata.filter.regex`
- Regex filters are not supported anymore and will be dropped on upgrade.
- ==== Source-transform removed
- The source `transform` feature has been removed. Instead, use an ingest pipeline
- ==== `_parent` field no longer indexed
- The join between parent and child documents no longer relies on indexed fields
- and therefore from 5.0.0 onwards the `_parent` field is no longer indexed. In
- order to find documents that refer to a specific parent id, the new
- `parent_id` query can be used. The GET response and hits inside the search
- response still include the parent id under the `_parent` key.
- ==== Source `format` option
- The `_source` mapping no longer supports the `format` option. It will still be
- accepted for indices created before the upgrade to 5.0 for backwards
- compatibility, but it will have no effect. Indices created on or after 5.0
- will reject this option.
- ==== Object notation
- Core types no longer support the object notation, which was used to provide
- per document boosts as follows:
- [source,js]
- ---------------
- {
- "value": "field_value",
- "boost": 42
- }
- ---------------
- ==== Boost accuracy for queries on `_all`
- Per-field boosts on the `_all` are now compressed into a single byte instead
- of the 4 bytes used previously. While this will make the index much more
- space-efficient, it also means that index time boosts will be less accurately
- encoded.
- ==== `_ttl` and `_timestamp` cannot be created
- You can no longer create indexes with `_ttl` or `_timestamp` enabled. Indexes
- with them enabled created before 5.0 will continue to work.
- You should replace `_timestamp` in new indexes by adding a field to your source
- either in the application producing the data or with an ingest pipline like
- this one:
- [source,js]
- ---------------
- PUT _ingest/pipeline/timestamp
- {
- "description" : "Adds a timestamp field at the current time",
- "processors" : [ {
- "set" : {
- "field": "timestamp",
- "value": "{{_ingest.timestamp}}"
- }
- } ]
- }
- PUT newindex/type/1?pipeline=timestamp
- {
- "example": "data"
- }
- GET newindex/type/1
- ---------------
- // CONSOLE
- Which produces
- [source,js]
- ---------------
- {
- "_source": {
- "example": "data",
- "timestamp": "2016-06-21T18:48:55.560+0000"
- },
- ...
- }
- ---------------
- // TESTRESPONSE[s/\.\.\./"found": true, "_id": "1", "_index": "newindex", "_type": "type", "_version": 1/]
- // TESTRESPONSE[s/"2016-06-21T18:48:55.560\+0000"/"$body._source.timestamp"/]
- If you have an old index created with 2.x that has `_timestamp` enabled then
- you can migrate it to a new index with the a `timestamp` field in the source
- with reindex:
- [source,js]
- ---------------
- POST _reindex
- {
- "source": {
- "index": "oldindex"
- },
- "dest": {
- "index": "newindex"
- },
- "script": {
- "lang": "painless",
- "inline": "ctx._source.timestamp = ctx._timestamp; ctx._timestamp = null"
- }
- }
- ---------------
- // CONSOLE
- // TEST[s/^/PUT oldindex\nGET _cluster\/health?wait_for_status=yellow\n/]
- You can replace `_ttl` with time based index names (preferred) or by adding a
- cron job which runs a delete-by-query on a timestamp field in the source
- document. If you had documents like this:
- [source,js]
- ---------------
- POST index/type/_bulk
- {"index":{"_id":1}}
- {"example": "data", "timestamp": "2016-06-21T18:48:55.560+0000" }
- {"index":{"_id":2}}
- {"example": "data", "timestamp": "2016-04-21T18:48:55.560+0000" }
- ---------------
- // CONSOLE
- Then you could delete all of the documents from before June 1st with:
- [source,js]
- ---------------
- POST index/type/_delete_by_query
- {
- "query": {
- "range" : {
- "timestamp" : {
- "lt" : "2016-05-01"
- }
- }
- }
- }
- ---------------
- // CONSOLE
- // TEST[continued]
- IMPORTANT: Keep in mind that deleting documents from an index is very expensive
- compared to deleting whole indexes. That is why time based indexes are
- recommended over this sort of thing and why `_ttl` was deprecated in the first
- place.
|