123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579 |
- [[search-aggregations-bucket-datehistogram-aggregation]]
- === Date Histogram Aggregation
- This multi-bucket aggregation is similar to the normal
- <<search-aggregations-bucket-histogram-aggregation,histogram>>, but it can
- only be used with date values. Because dates are represented internally in
- Elasticsearch as long values, it is possible, but not as accurate, to use the
- normal `histogram` on dates as well. The main difference in the two APIs is
- that here the interval can be specified using date/time expressions. Time-based
- data requires special support because time-based intervals are not always a
- fixed length.
- ==== Setting intervals
- There seems to be no limit to the creativity we humans apply to setting our
- clocks and calendars. We've invented leap years and leap seconds, standard and
- daylight savings times, and timezone offsets of 30 or 45 minutes rather than a
- full hour. While these creations help keep us in sync with the cosmos and our
- environment, they can make specifying time intervals accurately a real challenge.
- The only universal truth our researchers have yet to disprove is that a
- millisecond is always the same duration, and a second is always 1000 milliseconds.
- Beyond that, things get complicated.
- Generally speaking, when you specify a single time unit, such as 1 hour or 1 day, you
- are working with a _calendar interval_, but multiples, such as 6 hours or 3 days, are
- _fixed-length intervals_.
- For example, a specification of 1 day (1d) from now is a calendar interval that
- means "at
- this exact time tomorrow" no matter the length of the day. A change to or from
- daylight savings time that results in a 23 or 25 hour day is compensated for and the
- specification of "this exact time tomorrow" is maintained. But if you specify 2 or
- more days, each day must be of the same fixed duration (24 hours). In this case, if
- the specified interval includes the change to or from daylight savings time, the
- interval will end an hour sooner or later than you expect.
- There are similar differences to consider when you specify single versus multiple
- minutes or hours. Multiple time periods longer than a day are not supported.
- Here are the valid time specifications and their meanings:
- milliseconds (ms) ::
- Fixed length interval; supports multiples.
- seconds (s) ::
- 1000 milliseconds; fixed length interval (except for the last second of a
- minute that contains a leap-second, which is 2000ms long); supports multiples.
- minutes (m) ::
- All minutes begin at 00 seconds.
- * One minute (1m) is the interval between 00 seconds of the first minute and 00
- seconds of the following minute in the specified timezone, compensating for any
- intervening leap seconds, so that the number of minutes and seconds past the
- hour is the same at the start and end.
- * Multiple minutes (__n__m) are intervals of exactly 60x1000=60,000 milliseconds
- each.
- hours (h) ::
- All hours begin at 00 minutes and 00 seconds.
- * One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00
- minutes of the following hour in the specified timezone, compensating for any
- intervening leap seconds, so that the number of minutes and seconds past the hour
- is the same at the start and end.
- * Multiple hours (__n__h) are intervals of exactly 60x60x1000=3,600,000 milliseconds
- each.
- days (d) ::
- All days begin at the earliest possible time, which is usually 00:00:00
- (midnight).
- * One day (1d) is the interval between the start of the day and the start of
- of the following day in the specified timezone, compensating for any intervening
- time changes.
- * Multiple days (__n__d) are intervals of exactly 24x60x60x1000=86,400,000
- milliseconds each.
- weeks (w) ::
- * One week (1w) is the interval between the start day_of_week:hour:minute:second
- and the same day of the week and time of the following week in the specified
- timezone.
- * Multiple weeks (__n__w) are not supported.
- months (M) ::
- * One month (1M) is the interval between the start day of the month and time of
- day and the same day of the month and time of the following month in the specified
- timezone, so that the day of the month and time of day are the same at the start
- and end.
- * Multiple months (__n__M) are not supported.
- quarters (q) ::
- * One quarter (1q) is the interval between the start day of the month and
- time of day and the same day of the month and time of day three months later,
- so that the day of the month and time of day are the same at the start and end. +
- * Multiple quarters (__n__q) are not supported.
- years (y) ::
- * One year (1y) is the interval between the start day of the month and time of
- day and the same day of the month and time of day the following year in the
- specified timezone, so that the date and time are the same at the start and end. +
- * Multiple years (__n__y) are not supported.
- NOTE:
- In all cases, when the specified end time does not exist, the actual end time is
- the closest available time after the specified end.
- Widely distributed applications must also consider vagaries such as countries that
- start and stop daylight savings time at 12:01 A.M., so end up with one minute of
- Sunday followed by an additional 59 minutes of Saturday once a year, and countries
- that decide to move across the international date line. Situations like
- that can make irregular timezone offsets seem easy.
- As always, rigorous testing, especially around time-change events, will ensure
- that your time interval specification is
- what you intend it to be.
- WARNING:
- To avoid unexpected results, all connected servers and clients must sync to a
- reliable network time service.
- ==== Examples
- Requesting bucket intervals of a month.
- [source,js]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs" : {
- "sales_over_time" : {
- "date_histogram" : {
- "field" : "date",
- "interval" : "month"
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- You can also specify time values using abbreviations supported by
- <<time-units,time units>> parsing.
- Note that fractional time values are not supported, but you can address this by
- shifting to another
- time unit (e.g., `1.5h` could instead be specified as `90m`).
- [source,js]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs" : {
- "sales_over_time" : {
- "date_histogram" : {
- "field" : "date",
- "interval" : "90m"
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- ===== Keys
- Internally, a date is represented as a 64 bit number representing a timestamp
- in milliseconds-since-the-epoch (01/01/1970 midnight UTC). These timestamps are
- returned as the ++key++ name of the bucket. The `key_as_string` is the same
- timestamp converted to a formatted
- date string using the `format` parameter sprcification:
- TIP: If you don't specify `format`, the first date
- <<mapping-date-format,format>> specified in the field mapping is used.
- [source,js]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs" : {
- "sales_over_time" : {
- "date_histogram" : {
- "field" : "date",
- "interval" : "1M",
- "format" : "yyyy-MM-dd" <1>
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- <1> Supports expressive date <<date-format-pattern,format pattern>>
- Response:
- [source,js]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "sales_over_time": {
- "buckets": [
- {
- "key_as_string": "2015-01-01",
- "key": 1420070400000,
- "doc_count": 3
- },
- {
- "key_as_string": "2015-02-01",
- "key": 1422748800000,
- "doc_count": 2
- },
- {
- "key_as_string": "2015-03-01",
- "key": 1425168000000,
- "doc_count": 2
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- ===== Timezone
- Date-times are stored in Elasticsearch in UTC. By default, all bucketing and
- rounding is also done in UTC. Use the `time_zone` parameter to indicate
- that bucketing should use a different timezone.
- You can specify timezones as either an ISO 8601 UTC offset (e.g. `+01:00` or
- `-08:00`) or as a timezone ID as specified in the IANA timezone database,
- such as`America/Los_Angeles`.
- Consider the following example:
- [source,js]
- ---------------------------------
- PUT my_index/_doc/1?refresh
- {
- "date": "2015-10-01T00:30:00Z"
- }
- PUT my_index/_doc/2?refresh
- {
- "date": "2015-10-01T01:30:00Z"
- }
- GET my_index/_search?size=0
- {
- "aggs": {
- "by_day": {
- "date_histogram": {
- "field": "date",
- "interval": "day"
- }
- }
- }
- }
- ---------------------------------
- // CONSOLE
- If you don't specify a timezone, UTC is used. This would result in both of these
- documents being placed into the same day bucket, which starts at midnight UTC
- on 1 October 2015:
- [source,js]
- ---------------------------------
- {
- ...
- "aggregations": {
- "by_day": {
- "buckets": [
- {
- "key_as_string": "2015-10-01T00:00:00.000Z",
- "key": 1443657600000,
- "doc_count": 2
- }
- ]
- }
- }
- }
- ---------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- If you specify a `time_zone` of `-01:00`, midnight in that timezone is one hour
- before midnight UTC:
- [source,js]
- ---------------------------------
- GET my_index/_search?size=0
- {
- "aggs": {
- "by_day": {
- "date_histogram": {
- "field": "date",
- "interval": "day",
- "time_zone": "-01:00"
- }
- }
- }
- }
- ---------------------------------
- // CONSOLE
- // TEST[continued]
- Now the first document falls into the bucket for 30 September 2015, while the
- second document falls into the bucket for 1 October 2015:
- [source,js]
- ---------------------------------
- {
- ...
- "aggregations": {
- "by_day": {
- "buckets": [
- {
- "key_as_string": "2015-09-30T00:00:00.000-01:00", <1>
- "key": 1443574800000,
- "doc_count": 1
- },
- {
- "key_as_string": "2015-10-01T00:00:00.000-01:00", <1>
- "key": 1443661200000,
- "doc_count": 1
- }
- ]
- }
- }
- }
- ---------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- <1> The `key_as_string` value represents midnight on each day
- in the specified timezone.
- WARNING: When using time zones that follow DST (daylight savings time) changes,
- buckets close to the moment when those changes happen can have slightly different
- sizes than you would expect from the used `interval`.
- For example, consider a DST start in the `CET` time zone: on 27 March 2016 at 2am,
- clocks were turned forward 1 hour to 3am local time. If you use `day` as `interval`,
- the bucket covering that day will only hold data for 23 hours instead of the usual
- 24 hours for other buckets. The same is true for shorter intervals, like 12h,
- where you'll have only a 11h bucket on the morning of 27 March when the DST shift
- happens.
- ===== Offset
- Use the `offset` parameter to change the start value of each bucket by the
- specified positive (`+`) or negative offset (`-`) duration, such as `1h` for
- an hour, or `1d` for a day. See <<time-units>> for more possible time
- duration options.
- For example, when using an interval of `day`, each bucket runs from midnight
- to midnight. Setting the `offset` parameter to `+6h` changes each bucket
- to run from 6am to 6am:
- [source,js]
- -----------------------------
- PUT my_index/_doc/1?refresh
- {
- "date": "2015-10-01T05:30:00Z"
- }
- PUT my_index/_doc/2?refresh
- {
- "date": "2015-10-01T06:30:00Z"
- }
- GET my_index/_search?size=0
- {
- "aggs": {
- "by_day": {
- "date_histogram": {
- "field": "date",
- "interval": "day",
- "offset": "+6h"
- }
- }
- }
- }
- -----------------------------
- // CONSOLE
- Instead of a single bucket starting at midnight, the above request groups the
- documents into buckets starting at 6am:
- [source,js]
- -----------------------------
- {
- ...
- "aggregations": {
- "by_day": {
- "buckets": [
- {
- "key_as_string": "2015-09-30T06:00:00.000Z",
- "key": 1443592800000,
- "doc_count": 1
- },
- {
- "key_as_string": "2015-10-01T06:00:00.000Z",
- "key": 1443679200000,
- "doc_count": 1
- }
- ]
- }
- }
- }
- -----------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- NOTE: The start `offset` of each bucket is calculated after `time_zone`
- adjustments have been made.
- ===== Keyed Response
- Setting the `keyed` flag to `true` associates a unique string key with each
- bucket and returns the ranges as a hash rather than an array:
- [source,js]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs" : {
- "sales_over_time" : {
- "date_histogram" : {
- "field" : "date",
- "interval" : "1M",
- "format" : "yyyy-MM-dd",
- "keyed": true
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- Response:
- [source,js]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "sales_over_time": {
- "buckets": {
- "2015-01-01": {
- "key_as_string": "2015-01-01",
- "key": 1420070400000,
- "doc_count": 3
- },
- "2015-02-01": {
- "key_as_string": "2015-02-01",
- "key": 1422748800000,
- "doc_count": 2
- },
- "2015-03-01": {
- "key_as_string": "2015-03-01",
- "key": 1425168000000,
- "doc_count": 2
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- ===== Scripts
- As with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>,
- both document-level scripts and
- value-level scripts are supported. You can control the order of the returned
- buckets using the `order`
- settings and filter the returned buckets based on a `min_doc_count` setting
- (by default all buckets between the first
- bucket that matches documents and the last one are returned). This histogram
- also supports the `extended_bounds`
- setting, which enables extending the bounds of the histogram beyond the data
- itself. For more information, see
- <<search-aggregations-bucket-histogram-aggregation-extended-bounds,`Extended Bounds`>>.
- ===== Missing value
- The `missing` parameter defines how to treat documents that are missing a value.
- By default, they are ignored, but it is also possible to treat them as if they
- have a value.
- [source,js]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs" : {
- "sale_date" : {
- "date_histogram" : {
- "field" : "date",
- "interval": "year",
- "missing": "2000/01/01" <1>
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- <1> Documents without a value in the `publish_date` field will fall into the
- same bucket as documents that have the value `2000-01-01`.
- ===== Order
- By default the returned buckets are sorted by their `key` ascending, but you can
- control the order using
- the `order` setting. This setting supports the same `order` functionality as
- <<search-aggregations-bucket-terms-aggregation-order,`Terms Aggregation`>>.
- deprecated[6.0.0, Use `_key` instead of `_time` to order buckets by their dates/keys]
- ===== Using a script to aggregate by day of the week
- When you need to aggregate the results by day of the week, use a script that
- returns the day of the week:
- [source,js]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs": {
- "dayOfWeek": {
- "terms": {
- "script": {
- "lang": "painless",
- "source": "doc['date'].value.dayOfWeekEnum.value"
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- Response:
- [source,js]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "dayOfWeek": {
- "doc_count_error_upper_bound": 0,
- "sum_other_doc_count": 0,
- "buckets": [
- {
- "key": "7",
- "doc_count": 4
- },
- {
- "key": "4",
- "doc_count": 3
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- The response will contain all the buckets having the relative day of
- the week as key : 1 for Monday, 2 for Tuesday... 7 for Sunday.
|