understanding-groups.asciidoc 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250
  1. [role="xpack"]
  2. [[rollup-understanding-groups]]
  3. === Understanding groups
  4. experimental[]
  5. NOTE: For version 8.5 and above we recommend <<downsampling,downsampling>> over
  6. rollups as a way to reduce your storage costs for time series data.
  7. To preserve flexibility, Rollup Jobs are defined based on how future queries may need to use the data. Traditionally, systems force
  8. the admin to make decisions about what metrics to rollup and on what interval. E.g. The average of `cpu_time` on an hourly basis. This
  9. is limiting; if, in the future, the admin wishes to see the average of `cpu_time` on an hourly basis _and_ partitioned by `host_name`,
  10. they are out of luck.
  11. Of course, the admin can decide to rollup the `[hour, host]` tuple on an hourly basis, but as the number of grouping keys grows, so do the
  12. number of tuples the admin needs to configure. Furthermore, these `[hours, host]` tuples are only useful for hourly rollups... daily, weekly,
  13. or monthly rollups all require new configurations.
  14. Rather than force the admin to decide ahead of time which individual tuples should be rolled up, Elasticsearch's Rollup jobs are configured
  15. based on which groups are potentially useful to future queries. For example, this configuration:
  16. [source,js]
  17. --------------------------------------------------
  18. "groups" : {
  19. "date_histogram": {
  20. "field": "timestamp",
  21. "fixed_interval": "1h",
  22. "delay": "7d"
  23. },
  24. "terms": {
  25. "fields": ["hostname", "datacenter"]
  26. },
  27. "histogram": {
  28. "fields": ["load", "net_in", "net_out"],
  29. "interval": 5
  30. }
  31. }
  32. --------------------------------------------------
  33. // NOTCONSOLE
  34. Allows `date_histogram` to be used on the `"timestamp"` field, `terms` aggregations to be used on the `"hostname"` and `"datacenter"`
  35. fields, and `histograms` to be used on any of `"load"`, `"net_in"`, `"net_out"` fields.
  36. Importantly, these aggs/fields can be used in any combination. This aggregation:
  37. [source,js]
  38. --------------------------------------------------
  39. "aggs" : {
  40. "hourly": {
  41. "date_histogram": {
  42. "field": "timestamp",
  43. "fixed_interval": "1h"
  44. },
  45. "aggs": {
  46. "host_names": {
  47. "terms": {
  48. "field": "hostname"
  49. }
  50. }
  51. }
  52. }
  53. }
  54. --------------------------------------------------
  55. // NOTCONSOLE
  56. is just as valid as this aggregation:
  57. [source,js]
  58. --------------------------------------------------
  59. "aggs" : {
  60. "hourly": {
  61. "date_histogram": {
  62. "field": "timestamp",
  63. "fixed_interval": "1h"
  64. },
  65. "aggs": {
  66. "data_center": {
  67. "terms": {
  68. "field": "datacenter"
  69. }
  70. },
  71. "aggs": {
  72. "host_names": {
  73. "terms": {
  74. "field": "hostname"
  75. }
  76. },
  77. "aggs": {
  78. "load_values": {
  79. "histogram": {
  80. "field": "load",
  81. "interval": 5
  82. }
  83. }
  84. }
  85. }
  86. }
  87. }
  88. }
  89. --------------------------------------------------
  90. // NOTCONSOLE
  91. You'll notice that the second aggregation is not only substantially larger, it also swapped the position of the terms aggregation on
  92. `"hostname"`, illustrating how the order of aggregations does not matter to rollups. Similarly, while the `date_histogram` is required
  93. for rolling up data, it isn't required while querying (although often used). For example, this is a valid aggregation for
  94. Rollup Search to execute:
  95. [source,js]
  96. --------------------------------------------------
  97. "aggs" : {
  98. "host_names": {
  99. "terms": {
  100. "field": "hostname"
  101. }
  102. }
  103. }
  104. --------------------------------------------------
  105. // NOTCONSOLE
  106. Ultimately, when configuring `groups` for a job, think in terms of how you might wish to partition data in a query at a future date...
  107. then include those in the config. Because Rollup Search allows any order or combination of the grouped fields, you just need to decide
  108. if a field is useful for aggregating later, and how you might wish to use it (terms, histogram, etc).
  109. [[rollup-understanding-group-intervals]]
  110. ==== Calendar vs fixed time intervals
  111. Each rollup-job must have a date histogram group with a defined interval. {es}
  112. understands both
  113. <<calendar_and_fixed_intervals,calendar and fixed time intervals>>. Fixed time
  114. intervals are fairly easy to understand; `60s` means sixty seconds. But what
  115. does `1M` mean? One month of time depends on which month we are talking about,
  116. some months are longer or shorter than others. This is an example of calendar
  117. time and the duration of that unit depends on context. Calendar units are also
  118. affected by leap-seconds, leap-years, etc.
  119. This is important because the buckets generated by rollup are in either calendar
  120. or fixed intervals and this limits how you can query them later. See
  121. <<rollup-search-limitations-intervals>>.
  122. We recommend sticking with fixed time intervals, since they are easier to
  123. understand and are more flexible at query time. It will introduce some drift in
  124. your data during leap-events and you will have to think about months in a fixed
  125. quantity (30 days) instead of the actual calendar length. However, it is often
  126. easier than dealing with calendar units at query time.
  127. Multiples of units are always "fixed". For example, `2h` is always the fixed
  128. quantity `7200` seconds. Single units can be fixed or calendar depending on the
  129. unit:
  130. [options="header"]
  131. |=======
  132. |Unit |Calendar |Fixed
  133. |millisecond |NA |`1ms`, `10ms`, etc
  134. |second |NA |`1s`, `10s`, etc
  135. |minute |`1m` |`2m`, `10m`, etc
  136. |hour |`1h` |`2h`, `10h`, etc
  137. |day |`1d` |`2d`, `10d`, etc
  138. |week |`1w` |NA
  139. |month |`1M` |NA
  140. |quarter |`1q` |NA
  141. |year |`1y` |NA
  142. |=======
  143. For some units where there are both fixed and calendar, you may need to express
  144. the quantity in terms of the next smaller unit. For example, if you want a fixed
  145. day (not a calendar day), you should specify `24h` instead of `1d`. Similarly,
  146. if you want fixed hours, specify `60m` instead of `1h`. This is because the
  147. single quantity entails calendar time, and limits you to querying by calendar
  148. time in the future.
  149. ==== Grouping limitations with heterogeneous indices
  150. There was previously a limitation in how Rollup could handle indices that had heterogeneous mappings (multiple, unrelated/non-overlapping
  151. mappings). The recommendation at the time was to configure a separate job per data "type". For example, you might configure a separate
  152. job for each Beats module that you had enabled (one for `process`, another for `filesystem`, etc).
  153. This recommendation was driven by internal implementation details that caused document counts to be potentially incorrect if a single "merged"
  154. job was used.
  155. This limitation has since been alleviated. As of 6.4.0, it is now considered best practice to combine all rollup configurations
  156. into a single job.
  157. As an example, if your index has two types of documents:
  158. [source,js]
  159. --------------------------------------------------
  160. {
  161. "timestamp": 1516729294000,
  162. "temperature": 200,
  163. "voltage": 5.2,
  164. "node": "a"
  165. }
  166. --------------------------------------------------
  167. // NOTCONSOLE
  168. and
  169. [source,js]
  170. --------------------------------------------------
  171. {
  172. "timestamp": 1516729294000,
  173. "price": 123,
  174. "title": "Foo"
  175. }
  176. --------------------------------------------------
  177. // NOTCONSOLE
  178. the best practice is to combine them into a single rollup job which covers both of these document types, like this:
  179. [source,js]
  180. --------------------------------------------------
  181. PUT _rollup/job/combined
  182. {
  183. "index_pattern": "data-*",
  184. "rollup_index": "data_rollup",
  185. "cron": "*/30 * * * * ?",
  186. "page_size": 1000,
  187. "groups": {
  188. "date_histogram": {
  189. "field": "timestamp",
  190. "fixed_interval": "1h",
  191. "delay": "7d"
  192. },
  193. "terms": {
  194. "fields": [ "node", "title" ]
  195. }
  196. },
  197. "metrics": [
  198. {
  199. "field": "temperature",
  200. "metrics": [ "min", "max", "sum" ]
  201. },
  202. {
  203. "field": "price",
  204. "metrics": [ "avg" ]
  205. }
  206. ]
  207. }
  208. --------------------------------------------------
  209. // NOTCONSOLE
  210. ==== Doc counts and overlapping jobs
  211. There was previously an issue with document counts on "overlapping" job configurations, driven by the same internal implementation detail.
  212. If there were two Rollup jobs saving to the same index, where one job is a "subset" of another job, it was possible that document counts
  213. could be incorrect for certain aggregation arrangements.
  214. This issue has also since been eliminated in 6.4.0.