datehistogram-aggregation.asciidoc 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292
  1. [[search-aggregations-bucket-datehistogram-aggregation]]
  2. === Date Histogram Aggregation
  3. A multi-bucket aggregation similar to the <<search-aggregations-bucket-histogram-aggregation,histogram>> except it can
  4. only be applied on date values. Since dates are represented in elasticsearch internally as long values, it is possible
  5. to use the normal `histogram` on dates as well, though accuracy will be compromised. The reason for this is in the fact
  6. that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,
  7. we need special support for time based data. From a functionality perspective, this histogram supports the same features
  8. as the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>. The main difference is that the interval can be specified by date/time expressions.
  9. Requesting bucket intervals of a month.
  10. [source,js]
  11. --------------------------------------------------
  12. {
  13. "aggs" : {
  14. "articles_over_time" : {
  15. "date_histogram" : {
  16. "field" : "date",
  17. "interval" : "month"
  18. }
  19. }
  20. }
  21. }
  22. --------------------------------------------------
  23. Available expressions for interval: `year`, `quarter`, `month`, `week`, `day`, `hour`, `minute`, `second`
  24. Time values can also be specified via abbreviations supported by <<time-units,time units>> parsing.
  25. Note that fractional time values are not supported, but you can address this by shifting to another
  26. time unit (e.g., `1.5h` could instead be specified as `90m`).
  27. [source,js]
  28. --------------------------------------------------
  29. {
  30. "aggs" : {
  31. "articles_over_time" : {
  32. "date_histogram" : {
  33. "field" : "date",
  34. "interval" : "90m"
  35. }
  36. }
  37. }
  38. }
  39. --------------------------------------------------
  40. ==== Keys
  41. Internally, a date is represented as a 64 bit number representing a timestamp
  42. in milliseconds-since-the-epoch. These timestamps are returned as the bucket
  43. ++key++s. The `key_as_string` is the same timestamp converted to a formatted
  44. date string using the format specified with the `format` parameter:
  45. TIP: If no `format` is specified, then it will use the first date
  46. <<mapping-date-format,format>> specified in the field mapping.
  47. [source,js]
  48. --------------------------------------------------
  49. {
  50. "aggs" : {
  51. "articles_over_time" : {
  52. "date_histogram" : {
  53. "field" : "date",
  54. "interval" : "1M",
  55. "format" : "yyyy-MM-dd" <1>
  56. }
  57. }
  58. }
  59. }
  60. --------------------------------------------------
  61. <1> Supports expressive date <<date-format-pattern,format pattern>>
  62. Response:
  63. [source,js]
  64. --------------------------------------------------
  65. {
  66. "aggregations": {
  67. "articles_over_time": {
  68. "buckets": [
  69. {
  70. "key_as_string": "2013-02-02",
  71. "key": 1328140800000,
  72. "doc_count": 1
  73. },
  74. {
  75. "key_as_string": "2013-03-02",
  76. "key": 1330646400000,
  77. "doc_count": 2
  78. },
  79. ...
  80. ]
  81. }
  82. }
  83. }
  84. --------------------------------------------------
  85. ==== Time Zone
  86. Date-times are stored in Elasticsearch in UTC. By default, all bucketing and
  87. rounding is also done in UTC. The `time_zone` parameter can be used to indicate
  88. that bucketing should use a different time zone.
  89. Time zones may either be specified as an ISO 8601 UTC offset (e.g. `+01:00` or
  90. `-08:00`) or as a timezone id, an identifier used in the TZ database like
  91. `America/Los_Angeles`.
  92. Consider the following example:
  93. [source,js]
  94. ---------------------------------
  95. PUT my_index/log/1
  96. {
  97. "date": "2015-10-01T00:30:00Z"
  98. }
  99. PUT my_index/log/2
  100. {
  101. "date": "2015-10-01T01:30:00Z"
  102. }
  103. GET my_index/_search?size=0
  104. {
  105. "aggs": {
  106. "by_day": {
  107. "date_histogram": {
  108. "field": "date",
  109. "interval": "day"
  110. }
  111. }
  112. }
  113. }
  114. ---------------------------------
  115. UTC is used if no time zone is specified, which would result in both of these
  116. documents being placed into the same day bucket, which starts at midnight UTC
  117. on 1 October 2015:
  118. [source,js]
  119. ---------------------------------
  120. "aggregations": {
  121. "by_day": {
  122. "buckets": [
  123. {
  124. "key_as_string": "2015-10-01T00:00:00.000Z",
  125. "key": 1443657600000,
  126. "doc_count": 2
  127. }
  128. ]
  129. }
  130. }
  131. ---------------------------------
  132. If a `time_zone` of `-01:00` is specified, then midnight starts at one hour before
  133. midnight UTC:
  134. [source,js]
  135. ---------------------------------
  136. GET my_index/_search?size=0
  137. {
  138. "aggs": {
  139. "by_day": {
  140. "date_histogram": {
  141. "field": "date",
  142. "interval": "day",
  143. "time_zone": "-01:00"
  144. }
  145. }
  146. }
  147. }
  148. ---------------------------------
  149. Now the first document falls into the bucket for 30 September 2015, while the
  150. second document falls into the bucket for 1 October 2015:
  151. [source,js]
  152. ---------------------------------
  153. "aggregations": {
  154. "by_day": {
  155. "buckets": [
  156. {
  157. "key_as_string": "2015-09-30T00:00:00.000-01:00", <1>
  158. "key": 1443571200000,
  159. "doc_count": 1
  160. },
  161. {
  162. "key_as_string": "2015-10-01T00:00:00.000-01:00", <1>
  163. "key": 1443657600000,
  164. "doc_count": 1
  165. }
  166. ]
  167. }
  168. }
  169. ---------------------------------
  170. <1> The `key_as_string` value represents midnight on each day
  171. in the specified time zone.
  172. ==== Offset
  173. The `offset` parameter is used to change the start value of each bucket by the
  174. specified positive (`+`) or negative offset (`-`) duration, such as `1h` for
  175. an hour, or `1M` for a month. See <<time-units>> for more possible time
  176. duration options.
  177. For instance, when using an interval of `day`, each bucket runs from midnight
  178. to midnight. Setting the `offset` parameter to `+6h` would change each bucket
  179. to run from 6am to 6am:
  180. [source,js]
  181. -----------------------------
  182. PUT my_index/log/1
  183. {
  184. "date": "2015-10-01T05:30:00Z"
  185. }
  186. PUT my_index/log/2
  187. {
  188. "date": "2015-10-01T06:30:00Z"
  189. }
  190. GET my_index/_search?size=0
  191. {
  192. "aggs": {
  193. "by_day": {
  194. "date_histogram": {
  195. "field": "date",
  196. "interval": "day",
  197. "offset": "+6h"
  198. }
  199. }
  200. }
  201. }
  202. -----------------------------
  203. Instead of a single bucket starting at midnight, the above request groups the
  204. documents into buckets starting at 6am:
  205. [source,js]
  206. -----------------------------
  207. "aggregations": {
  208. "by_day": {
  209. "buckets": [
  210. {
  211. "key_as_string": "2015-09-30T06:00:00.000Z",
  212. "key": 1443592800000,
  213. "doc_count": 1
  214. },
  215. {
  216. "key_as_string": "2015-10-01T06:00:00.000Z",
  217. "key": 1443679200000,
  218. "doc_count": 1
  219. }
  220. ]
  221. }
  222. }
  223. -----------------------------
  224. NOTE: The start `offset` of each bucket is calculated after the `time_zone`
  225. adjustments have been made.
  226. ==== Scripts
  227. Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and
  228. value level scripts are supported. It is also possible to control the order of the returned buckets using the `order`
  229. settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first
  230. bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds`
  231. setting, which enables extending the bounds of the histogram beyond the data itself (to read more on why you'd want to
  232. do that please refer to the explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>).
  233. ==== Missing value
  234. The `missing` parameter defines how documents that are missing a value should be treated.
  235. By default they will be ignored but it is also possible to treat them as if they
  236. had a value.
  237. [source,js]
  238. --------------------------------------------------
  239. {
  240. "aggs" : {
  241. "publish_date" : {
  242. "date_histogram" : {
  243. "field" : "publish_date",
  244. "interval": "year",
  245. "missing": "2000-01-01" <1>
  246. }
  247. }
  248. }
  249. }
  250. --------------------------------------------------
  251. <1> Documents without a value in the `publish_date` field will fall into the same bucket as documents that have the value `2000-01-01`.