datehistogram-aggregation.asciidoc 6.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149
  1. [[search-aggregations-bucket-datehistogram-aggregation]]
  2. === Date Histogram Aggregation
  3. A multi-bucket aggregation similar to the <<search-aggregations-bucket-histogram-aggregation,histogram>> except it can
  4. only be applied on date values. Since dates are represented in elasticsearch internally as long values, it is possible
  5. to use the normal `histogram` on dates as well, though accuracy will be compromised. The reason for this is in the fact
  6. that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,
  7. we need special support for time based data. From a functionality perspective, this histogram supports the same features
  8. as the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>. The main difference is that the interval can be specified by date/time expressions.
  9. Requesting bucket intervals of a month.
  10. [source,js]
  11. --------------------------------------------------
  12. {
  13. "aggs" : {
  14. "articles_over_time" : {
  15. "date_histogram" : {
  16. "field" : "date",
  17. "interval" : "month"
  18. }
  19. }
  20. }
  21. }
  22. --------------------------------------------------
  23. Available expressions for interval: `year`, `quarter`, `month`, `week`, `day`, `hour`, `minute`, `second`
  24. Fractional values are allowed for seconds, minutes, hours, days and weeks. For example 1.5 hours:
  25. [source,js]
  26. --------------------------------------------------
  27. {
  28. "aggs" : {
  29. "articles_over_time" : {
  30. "date_histogram" : {
  31. "field" : "date",
  32. "interval" : "1.5h"
  33. }
  34. }
  35. }
  36. }
  37. --------------------------------------------------
  38. See <<time-units>> for accepted abbreviations.
  39. ==== Time Zone
  40. By default, times are stored as UTC milliseconds since the epoch. Thus, all computation and "bucketing" / "rounding" is
  41. done on UTC. It is possible to provide a time zone value, which will cause all bucket
  42. computations to take place in the specified zone. The time returned for each bucket/entry is milliseconds since the
  43. epoch in UTC. The parameters is called `time_zone`. It accepts either a ISO 8601 UTC offset, or a timezone id.
  44. A UTC offset has the form of a `+` or `-`, followed by two digit hour, followed by `:`, followed by two digit minutes.
  45. For example, `+01:00` represents 1 hour ahead of UTC. A timezone id is the identifier for a TZ database. For example,
  46. Pacific time is represented as `America\Los_Angeles`.
  47. Lets take an example. For `2012-04-01T04:15:30Z` (UTC), with a `time_zone` of `"-08:00"`. For day interval, the actual time by
  48. applying the time zone and rounding falls under `2012-03-31`, so the returned value will be (in millis) of
  49. `2012-03-31T08:00:00Z` (UTC). For hour interval, internally applying the time zone results in `2012-03-31T20:15:30`, so rounding it
  50. in the time zone results in `2012-03-31T20:00:00`, but we return that rounded value converted back in UTC so be consistent as
  51. `2012-04-01T04:00:00Z` (UTC).
  52. ==== Offset
  53. The `offset` option can be provided for shifting the date bucket intervals boundaries after any other shifts because of
  54. time zones are applies. This for example makes it possible that daily buckets go from 6AM to 6AM the next day instead of starting at 12AM
  55. or that monthly buckets go from the 10th of the month to the 10th of the next month instead of the 1st.
  56. The `offset` option accepts positive or negative time durations like "1h" for an hour or "1M" for a Month. See <<time-units>> for more
  57. possible time duration options.
  58. ==== Keys
  59. Since internally, dates are represented as 64bit numbers, these numbers are returned as the bucket keys (each key
  60. representing a date - milliseconds since the epoch). It is also possible to define a date format, which will result in
  61. returning the dates as formatted strings next to the numeric key values:
  62. [source,js]
  63. --------------------------------------------------
  64. {
  65. "aggs" : {
  66. "articles_over_time" : {
  67. "date_histogram" : {
  68. "field" : "date",
  69. "interval" : "1M",
  70. "format" : "yyyy-MM-dd" <1>
  71. }
  72. }
  73. }
  74. }
  75. --------------------------------------------------
  76. <1> Supports expressive date <<date-format-pattern,format pattern>>
  77. Response:
  78. [source,js]
  79. --------------------------------------------------
  80. {
  81. "aggregations": {
  82. "articles_over_time": {
  83. "buckets": [
  84. {
  85. "key_as_string": "2013-02-02",
  86. "key": 1328140800000,
  87. "doc_count": 1
  88. },
  89. {
  90. "key_as_string": "2013-03-02",
  91. "key": 1330646400000,
  92. "doc_count": 2
  93. },
  94. ...
  95. ]
  96. }
  97. }
  98. }
  99. --------------------------------------------------
  100. Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and
  101. value level scripts are supported. It is also possible to control the order of the returned buckets using the `order`
  102. settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first
  103. bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds`
  104. setting, which enables extending the bounds of the histogram beyond the data itself (to read more on why you'd want to
  105. do that please refer to the explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>).
  106. ==== Missing value
  107. The `missing` parameter defines how documents that are missing a value should be treated.
  108. By default they will be ignored but it is also possible to treat them as if they
  109. had a value.
  110. [source,js]
  111. --------------------------------------------------
  112. {
  113. "aggs" : {
  114. "publish_date" : {
  115. "datehistogram" : {
  116. "field" : "publish_date",
  117. "interval": "year",
  118. "missing": "2000-01-01" <1>
  119. }
  120. }
  121. }
  122. }
  123. --------------------------------------------------
  124. <1> Documents without a value in the `publish_date` field will fall into the same bucket as documents that have the value `2000-01-01`.