123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153 |
- [[index-modules-fielddata]]
- == Field data
- The field data cache is used mainly when sorting on or faceting on a
- field. It loads all the field values to memory in order to provide fast
- document based access to those values. The field data cache can be
- expensive to build for a field, so its recommended to have enough memory
- to allocate it, and to keep it loaded.
- The amount of memory used for the field
- data cache can be controlled using `indices.fielddata.cache.size`. Note:
- reloading the field data which does not fit into your cache will be expensive
- and perform poorly.
- [cols="<,<",options="header",]
- |=======================================================================
- |Setting |Description
- |`indices.fielddata.cache.size` |The max size of the field data cache,
- eg `30%` of node heap space, or an absolute value, eg `12GB`. Defaults
- to unbounded.
- |`indices.fielddata.cache.expire` |A time based setting that expires
- field data after a certain time of inactivity. Defaults to `-1`. For
- example, can be set to `5m` for a 5 minute expiry.
- |=======================================================================
- [float]
- === Fielddata loading
- By default, field data is loaded lazily, on the first time that a query that
- requires field data is fired. However, this can make the first requests that
- follow a merge operation quite slow since fielddata loading is a heavy
- operation.
- It is possible to force field data to be loaded and cached eagerly through the
- `loading` setting of fielddata:
- [source,js]
- --------------------------------------------------
- {
- category: {
- type: "string",
- fielddata: {
- loading: "eager"
- }
- }
- }
- --------------------------------------------------
- [float]
- [[field-data-filtering]]
- === Filtering fielddata
- It is possible to control which field values are loaded into memory,
- which is particularly useful for string fields. When specifying the
- <<mapping-core-types,mapping>> for a field, you
- can also specify a fielddata filter.
- Fielddata filters can be changed using the
- <<indices-put-mapping,PUT mapping>>
- API. After changing the filters, use the
- <<indices-clearcache,Clear Cache>> API
- to reload the fielddata using the new filters.
- [float]
- ==== Filtering by frequency:
- The frequency filter allows you to only load terms whose frequency falls
- between a `min` and `max` value, which can be expressed an absolute
- number or as a percentage (eg `0.01` is `1%`). Frequency is calculated
- *per segment*. Percentages are based on the number of docs which have a
- value for the field, as opposed to all docs in the segment.
- Small segments can be excluded completely by specifying the minimum
- number of docs that the segment should contain with `min_segment_size`:
- [source,js]
- --------------------------------------------------
- {
- tag: {
- type: "string",
- fielddata: {
- filter: {
- frequency: {
- min: 0.001,
- max: 0.1,
- min_segment_size: 500
- }
- }
- }
- }
- }
- --------------------------------------------------
- [float]
- ==== Filtering by regex
- Terms can also be filtered by regular expression - only values which
- match the regular expression are loaded. Note: the regular expression is
- applied to each term in the field, not to the whole field value. For
- instance, to only load hashtags from a tweet, we can use a regular
- expression which matches terms beginning with `#`:
- [source,js]
- --------------------------------------------------
- {
- tweet: {
- type: "string",
- analyzer: "whitespace"
- fielddata: {
- filter: {
- regex: {
- pattern: "^#.*"
- }
- }
- }
- }
- }
- --------------------------------------------------
- [float]
- ==== Combining filters
- The `frequency` and `regex` filters can be combined:
- [source,js]
- --------------------------------------------------
- {
- tweet: {
- type: "string",
- analyzer: "whitespace"
- fielddata: {
- filter: {
- regex: {
- pattern: "^#.*",
- },
- frequency: {
- min: 0.001,
- max: 0.1,
- min_segment_size: 500
- }
- }
- }
- }
- }
- --------------------------------------------------
- [float]
- [[field-data-monitoring]]
- === Monitoring field data
- You can monitor memory usage for field data using
- <<cluster-nodes-stats,Nodes Stats API>>
|