datehistogram-aggregation.asciidoc 5.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
  1. [[search-aggregations-bucket-datehistogram-aggregation]]
  2. === Date Histogram
  3. A multi-bucket aggregation similar to the <<search-aggregations-bucket-histogram-aggregation,histogram>> except it can
  4. only be applied on date values. Since dates are represented in elasticsearch internally as long values, it is possible
  5. to use the normal `histogram` on dates as well, though accuracy will be compromised. The reason for this is in the fact
  6. that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,
  7. we need a special support for time based data. From a functionality perspective, this histogram supports the same features
  8. as the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>. The main difference is that the interval can be specified by date/time expressions.
  9. Requesting bucket intervals of a month.
  10. [source,js]
  11. --------------------------------------------------
  12. {
  13. "aggs" : {
  14. "articles_over_time" : {
  15. "date_histogram" : {
  16. "field" : "date",
  17. "interval" : "month"
  18. }
  19. }
  20. }
  21. }
  22. --------------------------------------------------
  23. fractional values are allowed, for example 1.5 hours:
  24. [source,js]
  25. --------------------------------------------------
  26. {
  27. "aggs" : {
  28. "articles_over_time" : {
  29. "date_histogram" : {
  30. "field" : "date",
  31. "interval" : "1.5h"
  32. }
  33. }
  34. }
  35. }
  36. --------------------------------------------------
  37. Available expressions for interval: `year`, `quarter`, `month`, `week`, `day`, `hour`, `minute`, `second`
  38. ==== Time Zone
  39. By default, times are stored as UTC milliseconds since the epoch. Thus, all computation and "bucketing" / "rounding" is
  40. done on UTC. It is possible to provide a time zone (both pre rounding, and post rounding) value, which will cause all
  41. computations to take the relevant zone into account. The time returned for each bucket/entry is milliseconds since the
  42. epoch of the provided time zone.
  43. The parameters are `pre_zone` (pre rounding based on interval) and `post_zone` (post rounding based on interval). The
  44. `time_zone` parameter simply sets the `pre_zone` parameter. By default, those are set to `UTC`.
  45. The zone value accepts either a numeric value for the hours offset, for example: `"time_zone" : -2`. It also accepts a
  46. format of hours and minutes, like `"time_zone" : "-02:30"`. Another option is to provide a time zone accepted as one of
  47. the values listed here.
  48. Lets take an example. For `2012-04-01T04:15:30Z`, with a `pre_zone` of `-08:00`. For day interval, the actual time by
  49. applying the time zone and rounding falls under `2012-03-31`, so the returned value will be (in millis) of
  50. `2012-03-31T00:00:00Z` (UTC). For hour interval, applying the time zone results in `2012-03-31T20:15:30`, rounding it
  51. results in `2012-03-31T20:00:00`, but, we want to return it in UTC (`post_zone` is not set), so we convert it back to
  52. UTC: `2012-04-01T04:00:00Z`. Note, we are consistent in the results, returning the rounded value in UTC.
  53. `post_zone` simply takes the result, and adds the relevant offset.
  54. Sometimes, we want to apply the same conversion to UTC we did above for hour also for day (and up) intervals. We can
  55. set `pre_zone_adjust_large_interval` to `true`, which will apply the same conversion done for hour interval in the
  56. example, to day and above intervals (it can be set regardless of the interval, but only kick in when using day and
  57. higher intervals).
  58. ==== Factor
  59. The date histogram works on numeric values (since time is stored in milliseconds since the epoch in UTC). But,
  60. sometimes, systems will store a different resolution (like seconds since UTC) in a numeric field. The `factor`
  61. parameter can be used to change the value in the field to milliseconds to actual do the relevant rounding, and then
  62. be applied again to get to the original unit. For example, when storing in a numeric field seconds resolution, the
  63. factor can be set to 1000.
  64. ==== Pre/Post Offset
  65. Specific offsets can be provided for pre rounding and post rounding. The `pre_offset` for pre rounding, and
  66. `post_offset` for post rounding. The format is the date time format (`1h`, `1d`, etc...).
  67. ==== Keys
  68. Since internally, dates are represented as 64bit numbers, these numbers are returned as the bucket keys (each key
  69. representing a date - milliseconds since the epoch). It is also possible to define a date format, which will result in
  70. returning the dates as formatted strings next to the numeric key values:
  71. [source,js]
  72. --------------------------------------------------
  73. {
  74. "aggs" : {
  75. "articles_over_time" : {
  76. "date_histogram" : {
  77. "field" : "date",
  78. "interval" : "1M",
  79. "format" : "yyyy-MM-dd" <1>
  80. }
  81. }
  82. }
  83. }
  84. --------------------------------------------------
  85. <1> Supports expressive date <<date-format-pattern,format pattern>>
  86. Response:
  87. [source,js]
  88. --------------------------------------------------
  89. {
  90. "aggregations": {
  91. "articles_over_time": {
  92. "buckets": [
  93. {
  94. "key_as_string": "2013-02-02",
  95. "key": 1328140800000,
  96. "doc_count": 1
  97. },
  98. {
  99. "key_as_string": "2013-03-02",
  100. "key": 1330646400000,
  101. "doc_count": 2
  102. },
  103. ...
  104. ]
  105. }
  106. }
  107. }
  108. --------------------------------------------------
  109. Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and
  110. value level scripts are supported. It is also possilbe to control the order of the returned buckets using the `order`
  111. settings and filter the returned buckets based on a `min_doc_count` setting (by defaults to all buckets with
  112. `min_doc_count > 1` will be returned).