autodatehistogram-aggregation.asciidoc 8.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313
  1. [[search-aggregations-bucket-autodatehistogram-aggregation]]
  2. === Auto-interval Date Histogram Aggregation
  3. A multi-bucket aggregation similar to the <<search-aggregations-bucket-datehistogram-aggregation>> except
  4. instead of providing an interval to use as the width of each bucket, a target number of buckets is provided
  5. indicating the number of buckets needed and the interval of the buckets is automatically chosen to best achieve
  6. that target. The number of buckets returned will always be less than or equal to this target number.
  7. The buckets field is optional, and will default to 10 buckets if not specified.
  8. Requesting a target of 10 buckets.
  9. [source,console,id=autodatehistogram-aggregation-example]
  10. --------------------------------------------------
  11. POST /sales/_search?size=0
  12. {
  13. "aggs": {
  14. "sales_over_time": {
  15. "auto_date_histogram": {
  16. "field": "date",
  17. "buckets": 10
  18. }
  19. }
  20. }
  21. }
  22. --------------------------------------------------
  23. // TEST[setup:sales]
  24. ==== Keys
  25. Internally, a date is represented as a 64 bit number representing a timestamp
  26. in milliseconds-since-the-epoch. These timestamps are returned as the bucket
  27. ++key++s. The `key_as_string` is the same timestamp converted to a formatted
  28. date string using the format specified with the `format` parameter:
  29. TIP: If no `format` is specified, then it will use the first date
  30. <<mapping-date-format,format>> specified in the field mapping.
  31. [source,console,id=autodatehistogram-aggregation-format-example]
  32. --------------------------------------------------
  33. POST /sales/_search?size=0
  34. {
  35. "aggs": {
  36. "sales_over_time": {
  37. "auto_date_histogram": {
  38. "field": "date",
  39. "buckets": 5,
  40. "format": "yyyy-MM-dd" <1>
  41. }
  42. }
  43. }
  44. }
  45. --------------------------------------------------
  46. // TEST[setup:sales]
  47. <1> Supports expressive date <<date-format-pattern,format pattern>>
  48. Response:
  49. [source,console-result]
  50. --------------------------------------------------
  51. {
  52. ...
  53. "aggregations": {
  54. "sales_over_time": {
  55. "buckets": [
  56. {
  57. "key_as_string": "2015-01-01",
  58. "key": 1420070400000,
  59. "doc_count": 3
  60. },
  61. {
  62. "key_as_string": "2015-02-01",
  63. "key": 1422748800000,
  64. "doc_count": 2
  65. },
  66. {
  67. "key_as_string": "2015-03-01",
  68. "key": 1425168000000,
  69. "doc_count": 2
  70. }
  71. ],
  72. "interval": "1M"
  73. }
  74. }
  75. }
  76. --------------------------------------------------
  77. // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
  78. ==== Intervals
  79. The interval of the returned buckets is selected based on the data collected by the
  80. aggregation so that the number of buckets returned is less than or equal to the number
  81. requested. The possible intervals returned are:
  82. [horizontal]
  83. seconds:: In multiples of 1, 5, 10 and 30
  84. minutes:: In multiples of 1, 5, 10 and 30
  85. hours:: In multiples of 1, 3 and 12
  86. days:: In multiples of 1, and 7
  87. months:: In multiples of 1, and 3
  88. years:: In multiples of 1, 5, 10, 20, 50 and 100
  89. In the worst case, where the number of daily buckets are too many for the requested
  90. number of buckets, the number of buckets returned will be 1/7th of the number of
  91. buckets requested.
  92. ==== Time Zone
  93. Date-times are stored in Elasticsearch in UTC. By default, all bucketing and
  94. rounding is also done in UTC. The `time_zone` parameter can be used to indicate
  95. that bucketing should use a different time zone.
  96. Time zones may either be specified as an ISO 8601 UTC offset (e.g. `+01:00` or
  97. `-08:00`) or as a timezone id, an identifier used in the TZ database like
  98. `America/Los_Angeles`.
  99. Consider the following example:
  100. [source,console,id=autodatehistogram-aggregation-timezone-example]
  101. ---------------------------------
  102. PUT my_index/_doc/1?refresh
  103. {
  104. "date": "2015-10-01T00:30:00Z"
  105. }
  106. PUT my_index/_doc/2?refresh
  107. {
  108. "date": "2015-10-01T01:30:00Z"
  109. }
  110. PUT my_index/_doc/3?refresh
  111. {
  112. "date": "2015-10-01T02:30:00Z"
  113. }
  114. GET my_index/_search?size=0
  115. {
  116. "aggs": {
  117. "by_day": {
  118. "auto_date_histogram": {
  119. "field": "date",
  120. "buckets" : 3
  121. }
  122. }
  123. }
  124. }
  125. ---------------------------------
  126. UTC is used if no time zone is specified, three 1-hour buckets are returned
  127. starting at midnight UTC on 1 October 2015:
  128. [source,console-result]
  129. ---------------------------------
  130. {
  131. ...
  132. "aggregations": {
  133. "by_day": {
  134. "buckets": [
  135. {
  136. "key_as_string": "2015-10-01T00:00:00.000Z",
  137. "key": 1443657600000,
  138. "doc_count": 1
  139. },
  140. {
  141. "key_as_string": "2015-10-01T01:00:00.000Z",
  142. "key": 1443661200000,
  143. "doc_count": 1
  144. },
  145. {
  146. "key_as_string": "2015-10-01T02:00:00.000Z",
  147. "key": 1443664800000,
  148. "doc_count": 1
  149. }
  150. ],
  151. "interval": "1h"
  152. }
  153. }
  154. }
  155. ---------------------------------
  156. // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
  157. If a `time_zone` of `-01:00` is specified, then midnight starts at one hour before
  158. midnight UTC:
  159. [source,console]
  160. ---------------------------------
  161. GET my_index/_search?size=0
  162. {
  163. "aggs": {
  164. "by_day": {
  165. "auto_date_histogram": {
  166. "field": "date",
  167. "buckets" : 3,
  168. "time_zone": "-01:00"
  169. }
  170. }
  171. }
  172. }
  173. ---------------------------------
  174. // TEST[continued]
  175. Now three 1-hour buckets are still returned but the first bucket starts at
  176. 11:00pm on 30 September 2015 since that is the local time for the bucket in
  177. the specified time zone.
  178. [source,console-result]
  179. ---------------------------------
  180. {
  181. ...
  182. "aggregations": {
  183. "by_day": {
  184. "buckets": [
  185. {
  186. "key_as_string": "2015-09-30T23:00:00.000-01:00", <1>
  187. "key": 1443657600000,
  188. "doc_count": 1
  189. },
  190. {
  191. "key_as_string": "2015-10-01T00:00:00.000-01:00",
  192. "key": 1443661200000,
  193. "doc_count": 1
  194. },
  195. {
  196. "key_as_string": "2015-10-01T01:00:00.000-01:00",
  197. "key": 1443664800000,
  198. "doc_count": 1
  199. }
  200. ],
  201. "interval": "1h"
  202. }
  203. }
  204. }
  205. ---------------------------------
  206. // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
  207. <1> The `key_as_string` value represents midnight on each day
  208. in the specified time zone.
  209. WARNING: When using time zones that follow DST (daylight savings time) changes,
  210. buckets close to the moment when those changes happen can have slightly different
  211. sizes than neighbouring buckets.
  212. For example, consider a DST start in the `CET` time zone: on 27 March 2016 at 2am,
  213. clocks were turned forward 1 hour to 3am local time. If the result of the aggregation
  214. was daily buckets, the bucket covering that day will only hold data for 23 hours
  215. instead of the usual 24 hours for other buckets. The same is true for shorter intervals
  216. like e.g. 12h. Here, we will have only a 11h bucket on the morning of 27 March when the
  217. DST shift happens.
  218. ==== Scripts
  219. Like with the normal <<search-aggregations-bucket-datehistogram-aggregation, `date_histogram`>>, both document level
  220. scripts and value level scripts are supported. This aggregation does not however, support the `min_doc_count`,
  221. `extended_bounds`, `hard_bounds` and `order` parameters.
  222. ==== Minimum Interval parameter
  223. The `minimum_interval` allows the caller to specify the minimum rounding interval that should be used.
  224. This can make the collection process more efficient, as the aggregation will not attempt to round at
  225. any interval lower than `minimum_interval`.
  226. The accepted units for `minimum_interval` are:
  227. * year
  228. * month
  229. * day
  230. * hour
  231. * minute
  232. * second
  233. [source,console,id=autodatehistogram-aggregation-minimum-interval-example]
  234. --------------------------------------------------
  235. POST /sales/_search?size=0
  236. {
  237. "aggs": {
  238. "sale_date": {
  239. "auto_date_histogram": {
  240. "field": "date",
  241. "buckets": 10,
  242. "minimum_interval": "minute"
  243. }
  244. }
  245. }
  246. }
  247. --------------------------------------------------
  248. // TEST[setup:sales]
  249. ==== Missing value
  250. The `missing` parameter defines how documents that are missing a value should be treated.
  251. By default they will be ignored but it is also possible to treat them as if they
  252. had a value.
  253. [source,console,id=autodatehistogram-aggregation-missing-example]
  254. --------------------------------------------------
  255. POST /sales/_search?size=0
  256. {
  257. "aggs": {
  258. "sale_date": {
  259. "auto_date_histogram": {
  260. "field": "date",
  261. "buckets": 10,
  262. "missing": "2000/01/01" <1>
  263. }
  264. }
  265. }
  266. }
  267. --------------------------------------------------
  268. // TEST[setup:sales]
  269. <1> Documents without a value in the `publish_date` field will fall into the same bucket as documents that have the value `2000-01-01`.