| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270 | [[index-modules-fielddata]]== Field dataThe field data cache is used mainly when sorting on or faceting on afield. It loads all the field values to memory in order to provide fastdocument based access to those values. The field data cache can beexpensive to build for a field, so its recommended to have enough memoryto allocate it, and to keep it loaded.The amount of memory used for the fielddata cache can be controlled using `indices.fielddata.cache.size`. Note:reloading  the field data which does not fit into your cache will be expensiveand  perform poorly.[cols="<,<",options="header",]|=======================================================================|Setting |Description|`indices.fielddata.cache.size` |The max size of the field data cache,eg `30%` of node heap space, or an absolute value, eg `12GB`. Defaultsto unbounded.|`indices.fielddata.cache.expire` |A time based setting that expiresfield data after a certain time of inactivity. Defaults to `-1`. Forexample, can be set to `5m` for a 5 minute expiry.|=======================================================================[float][[fielddata-circuit-breaker]]=== Field data circuit breakerThe field data circuit breaker allows Elasticsearch to estimate the amount ofmemory a field will required to be loaded into memory. It can then prevent thefield data loading by raising and exception. By default the limit is configuredto 80% of the maximum JVM heap. It can be configured with the followingparameters:[cols="<,<",options="header",]|=======================================================================|Setting |Description|`indices.fielddata.breaker.limit` |Maximum size of estimated field datato allow loading. Defaults to 80% of the maximum JVM heap.|`indices.fielddata.breaker.overhead` |A constant that all field dataestimations are multiplied with to determine a final estimation. Defaults to1.03|=======================================================================Both the `indices.fielddata.breaker.limit` and`indices.fielddata.breaker.overhead` can be changed dynamically using thecluster update settings API.[float][[fielddata-monitoring]]=== Monitoring field dataYou can monitor memory usage for field data as well as the field data circuitbreaker using<<cluster-nodes-stats,Nodes Stats API>>[[fielddata-formats]]== Field data formatsThe field data format controls how field data should be stored.Depending on the field type, there might be several field data typesavailable. In particular, string and numeric types support the `doc_values`format which allows for computing the field data data-structures at indexingtime and storing them on disk. Although it will make the index larger and maybe slightly slower, this implementation will be more near-realtime-friendlyand will require much less memory from the JVM than other implementations.Here is an example of how to configure the `tag` field to use the `fst` fielddata format.[source,js]--------------------------------------------------{    tag: {        type:      "string",        fielddata: {            format: "fst"        }    }}--------------------------------------------------It is possible to change the field data format (and the field data settingsin general) on a live index by using the update mapping API. When doing so,field data which had already been loaded for existing segments will remainalive while new segments will use the new field data configuration. Thanks tothe background merging process, all segments will eventually use the newfield data format.[float]==== String field data types`paged_bytes` (default)::    Stores unique terms sequentially in a large buffer and maps documents to    the indices of the terms they contain in this large buffer.`fst`::    Stores terms in a FST. Slower to build than `paged_bytes` but can help lower    memory usage if many terms share common prefixes and/or suffixes.`doc_values`::    Computes and stores field data data-structures on disk at indexing time.    Lowers memory usage but only works on non-analyzed strings (`index`: `no` or    `not_analyzed`) and doesn't support filtering.[float]==== Numeric field data types`array` (default)::    Stores field values in memory using arrays.`doc_values`::    Computes and stores field data data-structures on disk at indexing time.    Doesn't support filtering.[float]==== Geo point field data types`array` (default)::    Stores latitudes and longitudes in arrays.`doc_values`::    Computes and stores field data data-structures on disk at indexing time.[float]=== Fielddata loadingBy default, field data is loaded lazily, ie. the first time that a query thatrequires them is executed. However, this can make the first requests thatfollow a merge operation quite slow since fielddata loading is a heavyoperation.It is possible to force field data to be loaded and cached eagerly through the`loading` setting of fielddata:[source,js]--------------------------------------------------{    category: {        type:      "string",        fielddata: {            loading: "eager"        }    }}--------------------------------------------------[float]==== Disabling field data loadingField data can take a lot of RAM so it makes sense to disable field dataloading on the fields that don't need field data, for example those that areused for full-text search only. In order to disable field data loading, justchange the field data format to `disabled`. When disabled, all requests thatwill try to load field data, e.g. when they include aggregations and/or sorting,will return an error.[source,js]--------------------------------------------------{    text: {        type:      "string",        fielddata: {            format: "disabled"        }    }}--------------------------------------------------The `disabled` format is supported by all field types.[float][[field-data-filtering]]=== Filtering fielddataIt is possible to control which field values are loaded into memory,which is particularly useful for string fields. When specifying the<<mapping-core-types,mapping>> for a field, youcan also specify a fielddata filter.Fielddata filters can be changed using the<<indices-put-mapping,PUT mapping>>API. After changing the filters, use the<<indices-clearcache,Clear Cache>> APIto reload the fielddata using the new filters.[float]==== Filtering by frequency:The frequency filter allows you to only load terms whose frequency fallsbetween a `min` and `max` value, which can be expressed an absolutenumber or as a percentage (eg `0.01` is `1%`). Frequency is calculated*per segment*. Percentages are based on the number of docs which have avalue for the field, as opposed to all docs in the segment.Small segments can be excluded completely by specifying the minimumnumber of docs that the segment should contain with `min_segment_size`:[source,js]--------------------------------------------------{    tag: {        type:      "string",        fielddata: {            filter: {                frequency: {                    min:              0.001,                    max:              0.1,                    min_segment_size: 500                }            }        }    }}--------------------------------------------------[float]==== Filtering by regexTerms can also be filtered by regular expression - only values whichmatch the regular expression are loaded. Note: the regular expression isapplied to each term in the field, not to the whole field value. Forinstance, to only load hashtags from a tweet, we can use a regularexpression which matches terms beginning with `#`:[source,js]--------------------------------------------------{    tweet: {        type:      "string",        analyzer:  "whitespace"        fielddata: {            filter: {                regex: {                    pattern: "^#.*"                }            }        }    }}--------------------------------------------------[float]==== Combining filtersThe `frequency` and `regex` filters can be combined:[source,js]--------------------------------------------------{    tweet: {        type:      "string",        analyzer:  "whitespace"        fielddata: {            filter: {                regex: {                    pattern:          "^#.*",                },                frequency: {                    min:              0.001,                    max:              0.1,                    min_segment_size: 500                }            }        }    }}--------------------------------------------------
 |