rollup-getting-started.asciidoc 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325
  1. [role="xpack"]
  2. [[rollup-getting-started]]
  3. === Getting started with {rollups}
  4. ++++
  5. <titleabbrev>Getting started</titleabbrev>
  6. ++++
  7. experimental[]
  8. NOTE: For version 8.5 and above we recommend <<downsampling,downsampling>> over
  9. rollups as a way to reduce your storage costs for time series data.
  10. To use the Rollup feature, you need to create one or more "Rollup Jobs". These jobs run continuously in the background
  11. and rollup the index or indices that you specify, placing the rolled documents in a secondary index (also of your choosing).
  12. Imagine you have a series of daily indices that hold sensor data (`sensor-2017-01-01`, `sensor-2017-01-02`, etc). A sample document might
  13. look like this:
  14. [source,js]
  15. --------------------------------------------------
  16. {
  17. "timestamp": 1516729294000,
  18. "temperature": 200,
  19. "voltage": 5.2,
  20. "node": "a"
  21. }
  22. --------------------------------------------------
  23. // NOTCONSOLE
  24. [discrete]
  25. ==== Creating a rollup job
  26. We'd like to rollup these documents into hourly summaries, which will allow us to generate reports and dashboards with any time interval
  27. one hour or greater. A rollup job might look like this:
  28. [source,console]
  29. --------------------------------------------------
  30. PUT _rollup/job/sensor
  31. {
  32. "index_pattern": "sensor-*",
  33. "rollup_index": "sensor_rollup",
  34. "cron": "*/30 * * * * ?",
  35. "page_size": 1000,
  36. "groups": {
  37. "date_histogram": {
  38. "field": "timestamp",
  39. "fixed_interval": "60m"
  40. },
  41. "terms": {
  42. "fields": [ "node" ]
  43. }
  44. },
  45. "metrics": [
  46. {
  47. "field": "temperature",
  48. "metrics": [ "min", "max", "sum" ]
  49. },
  50. {
  51. "field": "voltage",
  52. "metrics": [ "avg" ]
  53. }
  54. ]
  55. }
  56. --------------------------------------------------
  57. // TEST[setup:sensor_index]
  58. We give the job the ID of "sensor" (in the url: `PUT _rollup/job/sensor`), and tell it to rollup the index pattern `"sensor-*"`.
  59. This job will find and rollup any index that matches that pattern. Rollup summaries are then stored in the `"sensor_rollup"` index.
  60. The `cron` parameter controls when and how often the job activates. When a rollup job's cron schedule triggers, it will begin rolling up
  61. from where it left off after the last activation. So if you configure the cron to run every 30 seconds, the job will process the last 30
  62. seconds worth of data that was indexed into the `sensor-*` indices.
  63. If instead the cron was configured to run once a day at midnight, the job would process the last 24 hours worth of data. The choice is largely
  64. preference, based on how "realtime" you want the rollups, and if you wish to process continuously or move it to off-peak hours.
  65. Next, we define a set of `groups`. Essentially, we are defining the dimensions
  66. that we wish to pivot on at a later date when querying the data. The grouping in
  67. this job allows us to use `date_histogram` aggregations on the `timestamp` field,
  68. rolled up at hourly intervals. It also allows us to run terms aggregations on
  69. the `node` field.
  70. .Date histogram interval vs cron schedule
  71. **********************************
  72. You'll note that the job's cron is configured to run every 30 seconds, but the date_histogram is configured to
  73. rollup at 60 minute intervals. How do these relate?
  74. The date_histogram controls the granularity of the saved data. Data will be rolled up into hourly intervals, and you will be unable
  75. to query with finer granularity. The cron simply controls when the process looks for new data to rollup. Every 30 seconds it will see
  76. if there is a new hour's worth of data and roll it up. If not, the job goes back to sleep.
  77. Often, it doesn't make sense to define such a small cron (30s) on a large interval (1h), because the majority of the activations will
  78. simply go back to sleep. But there's nothing wrong with it either, the job will do the right thing.
  79. **********************************
  80. After defining which groups should be generated for the data, you next configure
  81. which metrics should be collected. By default, only the `doc_counts` are
  82. collected for each group. To make rollup useful, you will often add metrics
  83. like averages, mins, maxes, etc. In this example, the metrics are fairly
  84. straightforward: we want to save the min/max/sum of the `temperature`
  85. field, and the average of the `voltage` field.
  86. .Averages aren't composable?!
  87. **********************************
  88. If you've worked with rollups before, you may be cautious around averages. If an
  89. average is saved for a 10 minute interval, it usually isn't useful for larger
  90. intervals. You cannot average six 10-minute averages to find a hourly average;
  91. the average of averages is not equal to the total average.
  92. For this reason, other systems tend to either omit the ability to average or
  93. store the average at multiple intervals to support more flexible querying.
  94. Instead, the {rollup-features} save the `count` and `sum` for the defined time
  95. interval. This allows us to reconstruct the average at any interval greater-than
  96. or equal to the defined interval. This gives maximum flexibility for minimal
  97. storage costs... and you don't have to worry about average accuracies (no
  98. average of averages here!)
  99. **********************************
  100. For more details about the job syntax, see <<rollup-put-job>>.
  101. After you execute the above command and create the job, you'll receive the following response:
  102. [source,console-result]
  103. ----
  104. {
  105. "acknowledged": true
  106. }
  107. ----
  108. [discrete]
  109. ==== Starting the job
  110. After the job is created, it will be sitting in an inactive state. Jobs need to be started before they begin processing data (this allows
  111. you to stop them later as a way to temporarily pause, without deleting the configuration).
  112. To start the job, execute this command:
  113. [source,console]
  114. --------------------------------------------------
  115. POST _rollup/job/sensor/_start
  116. --------------------------------------------------
  117. // TEST[setup:sensor_rollup_job]
  118. [discrete]
  119. ==== Searching the rolled results
  120. After the job has run and processed some data, we can use the <<rollup-search>> endpoint to do some searching. The Rollup feature is designed
  121. so that you can use the same Query DSL syntax that you are accustomed to... it just happens to run on the rolled up data instead.
  122. For example, take this query:
  123. [source,console]
  124. --------------------------------------------------
  125. GET /sensor_rollup/_rollup_search
  126. {
  127. "size": 0,
  128. "aggregations": {
  129. "max_temperature": {
  130. "max": {
  131. "field": "temperature"
  132. }
  133. }
  134. }
  135. }
  136. --------------------------------------------------
  137. // TEST[setup:sensor_prefab_data]
  138. It's a simple aggregation that calculates the maximum of the `temperature` field. But you'll notice that it is being sent to the `sensor_rollup`
  139. index instead of the raw `sensor-*` indices. And you'll also notice that it is using the `_rollup_search` endpoint. Otherwise the syntax
  140. is exactly as you'd expect.
  141. If you were to execute that query, you'd receive a result that looks like a normal aggregation response:
  142. [source,console-result]
  143. ----
  144. {
  145. "took" : 102,
  146. "timed_out" : false,
  147. "terminated_early" : false,
  148. "_shards" : ... ,
  149. "hits" : {
  150. "total" : {
  151. "value": 0,
  152. "relation": "eq"
  153. },
  154. "max_score" : 0.0,
  155. "hits" : [ ]
  156. },
  157. "aggregations" : {
  158. "max_temperature" : {
  159. "value" : 202.0
  160. }
  161. }
  162. }
  163. ----
  164. // TESTRESPONSE[s/"took" : 102/"took" : $body.$_path/]
  165. // TESTRESPONSE[s/"_shards" : \.\.\. /"_shards" : $body.$_path/]
  166. The only notable difference is that Rollup search results have zero `hits`, because we aren't really searching the original, live data any
  167. more. Otherwise it's identical syntax.
  168. There are a few interesting takeaways here. Firstly, even though the data was rolled up with hourly intervals and partitioned by
  169. node name, the query we ran is just calculating the max temperature across all documents. The `groups` that were configured in the job
  170. are not mandatory elements of a query, they are just extra dimensions you can partition on. Second, the request and response syntax
  171. is nearly identical to normal DSL, making it easy to integrate into dashboards and applications.
  172. Finally, we can use those grouping fields we defined to construct a more complicated query:
  173. [source,console]
  174. --------------------------------------------------
  175. GET /sensor_rollup/_rollup_search
  176. {
  177. "size": 0,
  178. "aggregations": {
  179. "timeline": {
  180. "date_histogram": {
  181. "field": "timestamp",
  182. "fixed_interval": "7d"
  183. },
  184. "aggs": {
  185. "nodes": {
  186. "terms": {
  187. "field": "node"
  188. },
  189. "aggs": {
  190. "max_temperature": {
  191. "max": {
  192. "field": "temperature"
  193. }
  194. },
  195. "avg_voltage": {
  196. "avg": {
  197. "field": "voltage"
  198. }
  199. }
  200. }
  201. }
  202. }
  203. }
  204. }
  205. }
  206. --------------------------------------------------
  207. // TEST[setup:sensor_prefab_data]
  208. Which returns a corresponding response:
  209. [source,console-result]
  210. ----
  211. {
  212. "took" : 93,
  213. "timed_out" : false,
  214. "terminated_early" : false,
  215. "_shards" : ... ,
  216. "hits" : {
  217. "total" : {
  218. "value": 0,
  219. "relation": "eq"
  220. },
  221. "max_score" : 0.0,
  222. "hits" : [ ]
  223. },
  224. "aggregations" : {
  225. "timeline" : {
  226. "buckets" : [
  227. {
  228. "key_as_string" : "2018-01-18T00:00:00.000Z",
  229. "key" : 1516233600000,
  230. "doc_count" : 6,
  231. "nodes" : {
  232. "doc_count_error_upper_bound" : 0,
  233. "sum_other_doc_count" : 0,
  234. "buckets" : [
  235. {
  236. "key" : "a",
  237. "doc_count" : 2,
  238. "max_temperature" : {
  239. "value" : 202.0
  240. },
  241. "avg_voltage" : {
  242. "value" : 5.1499998569488525
  243. }
  244. },
  245. {
  246. "key" : "b",
  247. "doc_count" : 2,
  248. "max_temperature" : {
  249. "value" : 201.0
  250. },
  251. "avg_voltage" : {
  252. "value" : 5.700000047683716
  253. }
  254. },
  255. {
  256. "key" : "c",
  257. "doc_count" : 2,
  258. "max_temperature" : {
  259. "value" : 202.0
  260. },
  261. "avg_voltage" : {
  262. "value" : 4.099999904632568
  263. }
  264. }
  265. ]
  266. }
  267. }
  268. ]
  269. }
  270. }
  271. }
  272. ----
  273. // TESTRESPONSE[s/"took" : 93/"took" : $body.$_path/]
  274. // TESTRESPONSE[s/"_shards" : \.\.\. /"_shards" : $body.$_path/]
  275. In addition to being more complicated (date histogram and a terms aggregation, plus an additional average metric), you'll notice
  276. the date_histogram uses a `7d` interval instead of `60m`.
  277. [discrete]
  278. ==== Conclusion
  279. This quickstart should have provided a concise overview of the core functionality that Rollup exposes. There are more tips and things
  280. to consider when setting up Rollups, which you can find throughout the rest of this section. You may also explore the <<rollup-api-quickref,REST API>>
  281. for an overview of what is available.