123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136 |
- [[search-aggregations-bucket-datehistogram-aggregation]]
- === Date Histogram
- A multi-bucket aggregation similar to the <<search-aggregations-bucket-histogram-aggregation,histogram>> except it can
- only be applied on date values. Since dates are represented in elasticsearch internally as long values, it is possible
- to use the normal `histogram` on dates as well, though accuracy will be compromised. The reason for this is in the fact
- that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,
- we need a special support for time based data. From a functionality perspective, this histogram supports the same features
- as the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>. The main difference is that the interval can be specified by date/time expressions.
- Requesting bucket intervals of a month.
- [source,js]
- --------------------------------------------------
- {
- "aggs" : {
- "articles_over_time" : {
- "date_histogram" : {
- "field" : "date",
- "interval" : "month"
- }
- }
- }
- }
- --------------------------------------------------
- fractional values are allowed, for example 1.5 hours:
- [source,js]
- --------------------------------------------------
- {
- "aggs" : {
- "articles_over_time" : {
- "date_histogram" : {
- "field" : "date",
- "interval" : "1.5h"
- }
- }
- }
- }
- --------------------------------------------------
- Available expressions for interval: `year`, `quarter`, `month`, `week`, `day`, `hour`, `minute`, `second`
- ==== Time Zone
- By default, times are stored as UTC milliseconds since the epoch. Thus, all computation and "bucketing" / "rounding" is
- done on UTC. It is possible to provide a time zone (both pre rounding, and post rounding) value, which will cause all
- computations to take the relevant zone into account. The time returned for each bucket/entry is milliseconds since the
- epoch of the provided time zone.
- The parameters are `pre_zone` (pre rounding based on interval) and `post_zone` (post rounding based on interval). The
- `time_zone` parameter simply sets the `pre_zone` parameter. By default, those are set to `UTC`.
- The zone value accepts either a numeric value for the hours offset, for example: `"time_zone" : -2`. It also accepts a
- format of hours and minutes, like `"time_zone" : "-02:30"`. Another option is to provide a time zone accepted as one of
- the values listed here.
- Lets take an example. For `2012-04-01T04:15:30Z`, with a `pre_zone` of `-08:00`. For day interval, the actual time by
- applying the time zone and rounding falls under `2012-03-31`, so the returned value will be (in millis) of
- `2012-03-31T00:00:00Z` (UTC). For hour interval, applying the time zone results in `2012-03-31T20:15:30`, rounding it
- results in `2012-03-31T20:00:00`, but, we want to return it in UTC (`post_zone` is not set), so we convert it back to
- UTC: `2012-04-01T04:00:00Z`. Note, we are consistent in the results, returning the rounded value in UTC.
- `post_zone` simply takes the result, and adds the relevant offset.
- Sometimes, we want to apply the same conversion to UTC we did above for hour also for day (and up) intervals. We can
- set `pre_zone_adjust_large_interval` to `true`, which will apply the same conversion done for hour interval in the
- example, to day and above intervals (it can be set regardless of the interval, but only kick in when using day and
- higher intervals).
- ==== Factor
- The date histogram works on numeric values (since time is stored in milliseconds since the epoch in UTC). But,
- sometimes, systems will store a different resolution (like seconds since UTC) in a numeric field. The `factor`
- parameter can be used to change the value in the field to milliseconds to actual do the relevant rounding, and then
- be applied again to get to the original unit. For example, when storing in a numeric field seconds resolution, the
- factor can be set to 1000.
- ==== Pre/Post Offset
- Specific offsets can be provided for pre rounding and post rounding. The `pre_offset` for pre rounding, and
- `post_offset` for post rounding. The format is the date time format (`1h`, `1d`, etc...).
- ==== Keys
- Since internally, dates are represented as 64bit numbers, these numbers are returned as the bucket keys (each key
- representing a date - milliseconds since the epoch). It is also possible to define a date format, which will result in
- returning the dates as formatted strings next to the numeric key values:
- [source,js]
- --------------------------------------------------
- {
- "aggs" : {
- "articles_over_time" : {
- "date_histogram" : {
- "field" : "date",
- "interval" : "1M",
- "format" : "yyyy-MM-dd" <1>
- }
- }
- }
- }
- --------------------------------------------------
- <1> Supports expressive date <<date-format-pattern,format pattern>>
- Response:
- [source,js]
- --------------------------------------------------
- {
- "aggregations": {
- "articles_over_time": {
- "buckets": [
- {
- "key_as_string": "2013-02-02",
- "key": 1328140800000,
- "doc_count": 1
- },
- {
- "key_as_string": "2013-03-02",
- "key": 1330646400000,
- "doc_count": 2
- },
- ...
- ]
- }
- }
- }
- --------------------------------------------------
- Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and
- value level scripts are supported. It is also possilbe to control the order of the returned buckets using the `order`
- settings and filter the returned buckets based on a `min_doc_count` setting (by defaults to all buckets with
- `min_doc_count > 1` will be returned).
|