fielddata.asciidoc 3.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
  1. [[index-modules-fielddata]]
  2. == Field data
  3. The field data cache is used mainly when sorting on or faceting on a
  4. field. It loads all the field values to memory in order to provide fast
  5. document based access to those values. The field data cache can be
  6. expensive to build for a field, so its recommended to have enough memory
  7. to allocate it, and to keep it loaded.
  8. The amount of memory used for the field
  9. data cache can be controlled using `indices.fielddata.cache.size`. Note:
  10. reloading the field data which does not fit into your cache will be expensive
  11. and perform poorly.
  12. [cols="<,<",options="header",]
  13. |=======================================================================
  14. |Setting |Description
  15. |`indices.fielddata.cache.size` |The max size of the field data cache,
  16. eg `30%` of node heap space, or an absolute value, eg `12GB`. Defaults
  17. to unbounded.
  18. |`indices.fielddata.cache.expire` |A time based setting that expires
  19. field data after a certain time of inactivity. Defaults to `-1`. For
  20. example, can be set to `5m` for a 5 minute expiry.
  21. |=======================================================================
  22. [float]
  23. [[filtering]]
  24. === Filtering fielddata
  25. It is possible to control which field values are loaded into memory,
  26. which is particularly useful for string fields. When specifying the
  27. <<mapping-core-types,mapping>> for a field, you
  28. can also specify a fielddata filter.
  29. Fielddata filters can be changed using the
  30. <<indices-put-mapping,PUT mapping>>
  31. API. After changing the filters, use the
  32. <<indices-clearcache,Clear Cache>> API
  33. to reload the fielddata using the new filters.
  34. [float]
  35. ==== Filtering by frequency:
  36. The frequency filter allows you to only load terms whose frequency falls
  37. between a `min` and `max` value, which can be expressed an absolute
  38. number or as a percentage (eg `0.01` is `1%`). Frequency is calculated
  39. *per segment*. Percentages are based on the number of docs which have a
  40. value for the field, as opposed to all docs in the segment.
  41. Small segments can be excluded completely by specifying the minimum
  42. number of docs that the segment should contain with `min_segment_size`:
  43. [source,js]
  44. --------------------------------------------------
  45. {
  46. tag: {
  47. type: "string",
  48. fielddata: {
  49. filter: {
  50. frequency: {
  51. min: 0.001,
  52. max: 0.1,
  53. min_segment_size: 500
  54. }
  55. }
  56. }
  57. }
  58. }
  59. --------------------------------------------------
  60. [float]
  61. ==== Filtering by regex
  62. Terms can also be filtered by regular expression - only values which
  63. match the regular expression are loaded. Note: the regular expression is
  64. applied to each term in the field, not to the whole field value. For
  65. instance, to only load hashtags from a tweet, we can use a regular
  66. expression which matches terms beginning with `#`:
  67. [source,js]
  68. --------------------------------------------------
  69. {
  70. tweet: {
  71. type: "string",
  72. analyzer: "whitespace"
  73. fielddata: {
  74. filter: {
  75. regex: {
  76. pattern: "^#.*"
  77. }
  78. }
  79. }
  80. }
  81. }
  82. --------------------------------------------------
  83. [float]
  84. ==== Combining filters
  85. The `frequency` and `regex` filters can be combined:
  86. [source,js]
  87. --------------------------------------------------
  88. {
  89. tweet: {
  90. type: "string",
  91. analyzer: "whitespace"
  92. fielddata: {
  93. filter: {
  94. regex: {
  95. pattern: "^#.*",
  96. },
  97. frequency: {
  98. min: 0.001,
  99. max: 0.1,
  100. min_segment_size: 500
  101. }
  102. }
  103. }
  104. }
  105. }
  106. --------------------------------------------------
  107. [float]
  108. [[monitoring]]
  109. === Monitoring field data
  110. You can monitor memory usage for field data using
  111. <<cluster-nodes-stats,Nodes Stats API>>