fielddata.asciidoc 3.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
  1. [[index-modules-fielddata]]
  2. == Field data
  3. The field data cache is used mainly when sorting on or faceting on a
  4. field. It loads all the field values to memory in order to provide fast
  5. document based access to those values. The field data cache can be
  6. expensive to build for a field, so its recommended to have enough memory
  7. to allocate it, and to keep it loaded.
  8. The amount of memory used for the field
  9. data cache can be controlled using `indices.fielddata.cache.size`. Note:
  10. reloading the field data which does not fit into your cache will be expensive
  11. and perform poorly.
  12. [cols="<,<",options="header",]
  13. |=======================================================================
  14. |Setting |Description
  15. |`indices.fielddata.cache.size` |The max size of the field data cache,
  16. eg `30%` of node heap space, or an absolute value, eg `12GB`. Defaults
  17. to unbounded.
  18. |`indices.fielddata.cache.expire` |A time based setting that expires
  19. field data after a certain time of inactivity. Defaults to `-1`. For
  20. example, can be set to `5m` for a 5 minute expiry.
  21. |=======================================================================
  22. [float]
  23. === Filtering fielddata
  24. It is possible to control which field values are loaded into memory,
  25. which is particularly useful for string fields. When specifying the
  26. <<mapping-core-types,mapping>> for a field, you
  27. can also specify a fielddata filter.
  28. Fielddata filters can be changed using the
  29. <<indices-put-mapping,PUT mapping>>
  30. API. After changing the filters, use the
  31. <<indices-clearcache,Clear Cache>> API
  32. to reload the fielddata using the new filters.
  33. [float]
  34. ==== Filtering by frequency:
  35. The frequency filter allows you to only load terms whose frequency falls
  36. between a `min` and `max` value, which can be expressed an absolute
  37. number or as a percentage (eg `0.01` is `1%`). Frequency is calculated
  38. *per segment*. Percentages are based on the number of docs which have a
  39. value for the field, as opposed to all docs in the segment.
  40. Small segments can be excluded completely by specifying the minimum
  41. number of docs that the segment should contain with `min_segment_size`:
  42. [source,js]
  43. --------------------------------------------------
  44. {
  45. tag: {
  46. type: "string",
  47. fielddata: {
  48. filter: {
  49. frequency: {
  50. min: 0.001,
  51. max: 0.1,
  52. min_segment_size: 500
  53. }
  54. }
  55. }
  56. }
  57. }
  58. --------------------------------------------------
  59. [float]
  60. ==== Filtering by regex
  61. Terms can also be filtered by regular expression - only values which
  62. match the regular expression are loaded. Note: the regular expression is
  63. applied to each term in the field, not to the whole field value. For
  64. instance, to only load hashtags from a tweet, we can use a regular
  65. expression which matches terms beginning with `#`:
  66. [source,js]
  67. --------------------------------------------------
  68. {
  69. tweet: {
  70. type: "string",
  71. analyzer: "whitespace"
  72. fielddata: {
  73. filter: {
  74. regex: {
  75. pattern: "^#.*"
  76. }
  77. }
  78. }
  79. }
  80. }
  81. --------------------------------------------------
  82. [float]
  83. ==== Combining filters
  84. The `frequency` and `regex` filters can be combined:
  85. [source,js]
  86. --------------------------------------------------
  87. {
  88. tweet: {
  89. type: "string",
  90. analyzer: "whitespace"
  91. fielddata: {
  92. filter: {
  93. regex: {
  94. pattern: "^#.*",
  95. },
  96. frequency: {
  97. min: 0.001,
  98. max: 0.1,
  99. min_segment_size: 500
  100. }
  101. }
  102. }
  103. }
  104. }
  105. --------------------------------------------------
  106. [float]
  107. === Monitoring field data
  108. You can monitor memory usage for field data using
  109. <<cluster-nodes-stats,Nodes Stats API>>