| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134 | [[fielddata]]=== `fielddata`Most fields are <<mapping-index,indexed>> by default, which makes themsearchable. Sorting, aggregations, and accessing field values in scripts,however, requires a different access pattern from search.Search needs to answer the question _"Which documents contain this term?"_,while sorting and aggregations need to answer a different question: _"What isthe value of this field for **this** document?"_.Most fields can use index-time, on-disk <<doc-values,`doc_values`>> for thisdata access pattern, but <<text,`text`>> fields do not support `doc_values`.Instead, `text` fields use a query-time *in-memory* data structure called`fielddata`.  This data structure is built on demand the first time that afield is used for aggregations, sorting, or in a script.  It is built byreading the entire inverted index for each segment from disk, inverting theterm ↔︎ document relationship, and storing the result in memory, in the JVMheap.[[fielddata-disabled-text-fields]]==== Fielddata is disabled on `text` fields by defaultFielddata can consume a *lot* of heap space, especially when loading highcardinality `text` fields.  Once fielddata has been loaded into the heap, itremains there for the lifetime of the segment. Also, loading fielddata is anexpensive process which can cause users to experience latency hits.  This iswhy fielddata is disabled by default.If you try to sort, aggregate, or access values from a script on a `text`field, you will see this exception:[literal]Fielddata is disabled on text fields by default.  Set `fielddata=true` on[`your_field_name`] in order to load  fielddata in memory by uninverting theinverted index. Note that this can however use significant memory.[[before-enabling-fielddata]]==== Before enabling fielddataBefore you enable fielddata, consider why you are using a `text` field foraggregations, sorting, or in a script.  It usually doesn't make sense to doso.A text field is analyzed before indexing so that a value like`New York` can be found by searching for `new` or for `york`.  A `terms`aggregation on this field will return a `new` bucket and a `york` bucket, whenyou probably want a single bucket called `New York`.Instead, you should have a `text` field for full text searches, and anunanalyzed <<keyword,`keyword`>> field with <<doc-values,`doc_values`>>enabled for aggregations, as follows:[source,console]---------------------------------PUT my_index{  "mappings": {    "properties": {      "my_field": { <1>        "type": "text",        "fields": {          "keyword": { <2>            "type": "keyword"          }        }      }    }  }}---------------------------------<1> Use the `my_field` field for searches.<2> Use the `my_field.keyword` field for aggregations, sorting, or in scripts.[[enable-fielddata-text-fields]]==== Enabling fielddata on `text` fieldsYou can enable fielddata on an existing `text` field using the<<indices-put-mapping,PUT mapping API>> as follows:[source,console]-----------------------------------PUT my_index/_mapping{  "properties": {    "my_field": { <1>      "type":     "text",      "fielddata": true    }  }}-----------------------------------// TEST[continued]<1> The mapping that you specify for `my_field` should consist of the existing    mapping for that field, plus the `fielddata` parameter.[[field-data-filtering]]==== `fielddata_frequency_filter`Fielddata filtering can be used to reduce the number of terms loaded intomemory, and thus reduce memory usage. Terms can be filtered by _frequency_:The frequency filter allows you to only load terms whose document frequency fallsbetween a `min` and `max` value, which can be expressed an absolutenumber (when the number is bigger than 1.0) or as a percentage(eg `0.01` is `1%` and `1.0` is `100%`). Frequency is calculated*per segment*. Percentages are based on the number of docs which have avalue for the field, as opposed to all docs in the segment.Small segments can be excluded completely by specifying the minimumnumber of docs that the segment should contain with `min_segment_size`:[source,console]--------------------------------------------------PUT my_index{  "mappings": {    "properties": {      "tag": {        "type": "text",        "fielddata": true,        "fielddata_frequency_filter": {          "min": 0.001,          "max": 0.1,          "min_segment_size": 500        }      }    }  }}--------------------------------------------------
 |