fielddata.asciidoc 4.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153
  1. [[index-modules-fielddata]]
  2. == Field data
  3. The field data cache is used mainly when sorting on or faceting on a
  4. field. It loads all the field values to memory in order to provide fast
  5. document based access to those values. The field data cache can be
  6. expensive to build for a field, so its recommended to have enough memory
  7. to allocate it, and to keep it loaded.
  8. The amount of memory used for the field
  9. data cache can be controlled using `indices.fielddata.cache.size`. Note:
  10. reloading the field data which does not fit into your cache will be expensive
  11. and perform poorly.
  12. [cols="<,<",options="header",]
  13. |=======================================================================
  14. |Setting |Description
  15. |`indices.fielddata.cache.size` |The max size of the field data cache,
  16. eg `30%` of node heap space, or an absolute value, eg `12GB`. Defaults
  17. to unbounded.
  18. |`indices.fielddata.cache.expire` |A time based setting that expires
  19. field data after a certain time of inactivity. Defaults to `-1`. For
  20. example, can be set to `5m` for a 5 minute expiry.
  21. |=======================================================================
  22. [float]
  23. === Fielddata loading
  24. By default, field data is loaded lazily, on the first time that a query that
  25. requires field data is fired. However, this can make the first requests that
  26. follow a merge operation quite slow since fielddata loading is a heavy
  27. operation.
  28. It is possible to force field data to be loaded and cached eagerly through the
  29. `loading` setting of fielddata:
  30. [source,js]
  31. --------------------------------------------------
  32. {
  33. category: {
  34. type: "string",
  35. fielddata: {
  36. loading: "eager"
  37. }
  38. }
  39. }
  40. --------------------------------------------------
  41. [float]
  42. [[field-data-filtering]]
  43. === Filtering fielddata
  44. It is possible to control which field values are loaded into memory,
  45. which is particularly useful for string fields. When specifying the
  46. <<mapping-core-types,mapping>> for a field, you
  47. can also specify a fielddata filter.
  48. Fielddata filters can be changed using the
  49. <<indices-put-mapping,PUT mapping>>
  50. API. After changing the filters, use the
  51. <<indices-clearcache,Clear Cache>> API
  52. to reload the fielddata using the new filters.
  53. [float]
  54. ==== Filtering by frequency:
  55. The frequency filter allows you to only load terms whose frequency falls
  56. between a `min` and `max` value, which can be expressed an absolute
  57. number or as a percentage (eg `0.01` is `1%`). Frequency is calculated
  58. *per segment*. Percentages are based on the number of docs which have a
  59. value for the field, as opposed to all docs in the segment.
  60. Small segments can be excluded completely by specifying the minimum
  61. number of docs that the segment should contain with `min_segment_size`:
  62. [source,js]
  63. --------------------------------------------------
  64. {
  65. tag: {
  66. type: "string",
  67. fielddata: {
  68. filter: {
  69. frequency: {
  70. min: 0.001,
  71. max: 0.1,
  72. min_segment_size: 500
  73. }
  74. }
  75. }
  76. }
  77. }
  78. --------------------------------------------------
  79. [float]
  80. ==== Filtering by regex
  81. Terms can also be filtered by regular expression - only values which
  82. match the regular expression are loaded. Note: the regular expression is
  83. applied to each term in the field, not to the whole field value. For
  84. instance, to only load hashtags from a tweet, we can use a regular
  85. expression which matches terms beginning with `#`:
  86. [source,js]
  87. --------------------------------------------------
  88. {
  89. tweet: {
  90. type: "string",
  91. analyzer: "whitespace"
  92. fielddata: {
  93. filter: {
  94. regex: {
  95. pattern: "^#.*"
  96. }
  97. }
  98. }
  99. }
  100. }
  101. --------------------------------------------------
  102. [float]
  103. ==== Combining filters
  104. The `frequency` and `regex` filters can be combined:
  105. [source,js]
  106. --------------------------------------------------
  107. {
  108. tweet: {
  109. type: "string",
  110. analyzer: "whitespace"
  111. fielddata: {
  112. filter: {
  113. regex: {
  114. pattern: "^#.*",
  115. },
  116. frequency: {
  117. min: 0.001,
  118. max: 0.1,
  119. min_segment_size: 500
  120. }
  121. }
  122. }
  123. }
  124. }
  125. --------------------------------------------------
  126. [float]
  127. [[field-data-monitoring]]
  128. === Monitoring field data
  129. You can monitor memory usage for field data using
  130. <<cluster-nodes-stats,Nodes Stats API>>