123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791 |
- [[search-aggregations-bucket-datehistogram-aggregation]]
- === Date histogram aggregation
- ++++
- <titleabbrev>Date histogram</titleabbrev>
- ++++
- This multi-bucket aggregation is similar to the normal
- <<search-aggregations-bucket-histogram-aggregation,histogram>>, but it can
- only be used with date or date range values. Because dates are represented internally in
- Elasticsearch as long values, it is possible, but not as accurate, to use the
- normal `histogram` on dates as well. The main difference in the two APIs is
- that here the interval can be specified using date/time expressions. Time-based
- data requires special support because time-based intervals are not always a
- fixed length.
- Like the histogram, values are rounded *down* into the closest bucket. For
- example, if the interval is a calendar day, `2020-01-03T07:00:01Z` is rounded to
- `2020-01-03T00:00:00Z`. Values are rounded as follows:
- [source,java]
- ----
- bucket_key = Math.floor(value / interval) * interval
- ----
- [[calendar_and_fixed_intervals]]
- ==== Calendar and fixed intervals
- When configuring a date histogram aggregation, the interval can be specified
- in two manners: calendar-aware time intervals, and fixed time intervals.
- Calendar-aware intervals understand that daylight savings changes the length
- of specific days, months have different amounts of days, and leap seconds can
- be tacked onto a particular year.
- Fixed intervals are, by contrast, always multiples of SI units and do not change
- based on calendaring context.
- [NOTE]
- .Combined `interval` field is deprecated
- ==================================
- deprecated[7.2, `interval` field is deprecated] Historically both calendar and fixed
- intervals were configured in a single `interval` field, which led to confusing
- semantics. Specifying `1d` would be assumed as a calendar-aware time,
- whereas `2d` would be interpreted as fixed time. To get "one day" of fixed time,
- the user would need to specify the next smaller unit (in this case, `24h`).
- This combined behavior was often unknown to users, and even when knowledgeable about
- the behavior it was difficult to use and confusing.
- This behavior has been deprecated in favor of two new, explicit fields: `calendar_interval`
- and `fixed_interval`.
- By forcing a choice between calendar and intervals up front, the semantics of the interval
- are clear to the user immediately and there is no ambiguity. The old `interval` field
- will be removed in the future.
- ==================================
- [[calendar_intervals]]
- ==== Calendar intervals
- Calendar-aware intervals are configured with the `calendar_interval` parameter.
- You can specify calendar intervals using the unit name, such as `month`, or as a
- single unit quantity, such as `1M`. For example, `day` and `1d` are equivalent.
- Multiple quantities, such as `2d`, are not supported.
- The accepted calendar intervals are:
- `minute`, `1m` ::
- All minutes begin at 00 seconds.
- One minute is the interval between 00 seconds of the first minute and 00
- seconds of the following minute in the specified time zone, compensating for any
- intervening leap seconds, so that the number of minutes and seconds past the
- hour is the same at the start and end.
- `hour`, `1h` ::
- All hours begin at 00 minutes and 00 seconds.
- One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00
- minutes of the following hour in the specified time zone, compensating for any
- intervening leap seconds, so that the number of minutes and seconds past the hour
- is the same at the start and end.
- `day`, `1d` ::
- All days begin at the earliest possible time, which is usually 00:00:00
- (midnight).
- One day (1d) is the interval between the start of the day and the start of
- the following day in the specified time zone, compensating for any intervening
- time changes.
- `week`, `1w` ::
- One week is the interval between the start day_of_week:hour:minute:second
- and the same day of the week and time of the following week in the specified
- time zone.
- `month`, `1M` ::
- One month is the interval between the start day of the month and time of
- day and the same day of the month and time of the following month in the specified
- time zone, so that the day of the month and time of day are the same at the start
- and end.
- `quarter`, `1q` ::
- One quarter is the interval between the start day of the month and
- time of day and the same day of the month and time of day three months later,
- so that the day of the month and time of day are the same at the start and end. +
- `year`, `1y` ::
- One year is the interval between the start day of the month and time of
- day and the same day of the month and time of day the following year in the
- specified time zone, so that the date and time are the same at the start and end. +
- [[calendar_interval_examples]]
- ===== Calendar interval examples
- As an example, here is an aggregation requesting bucket intervals of a month in calendar time:
- [source,console,id=datehistogram-aggregation-calendar-interval-example]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs": {
- "sales_over_time": {
- "date_histogram": {
- "field": "date",
- "calendar_interval": "month"
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:sales]
- If you attempt to use multiples of calendar units, the aggregation will fail because only
- singular calendar units are supported:
- [source,console,id=datehistogram-aggregation-calendar-interval-multiples-example]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs": {
- "sales_over_time": {
- "date_histogram": {
- "field": "date",
- "calendar_interval": "2d"
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:sales]
- // TEST[catch:bad_request]
- [source,js]
- --------------------------------------------------
- {
- "error" : {
- "root_cause" : [...],
- "type" : "x_content_parse_exception",
- "reason" : "[1:82] [date_histogram] failed to parse field [calendar_interval]",
- "caused_by" : {
- "type" : "illegal_argument_exception",
- "reason" : "The supplied interval [2d] could not be parsed as a calendar interval.",
- "stack_trace" : "java.lang.IllegalArgumentException: The supplied interval [2d] could not be parsed as a calendar interval."
- }
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- [[fixed_intervals]]
- ==== Fixed intervals
- Fixed intervals are configured with the `fixed_interval` parameter.
- In contrast to calendar-aware intervals, fixed intervals are a fixed number of SI
- units and never deviate, regardless of where they fall on the calendar. One second
- is always composed of `1000ms`. This allows fixed intervals to be specified in
- any multiple of the supported units.
- However, it means fixed intervals cannot express other units such as months,
- since the duration of a month is not a fixed quantity. Attempting to specify
- a calendar interval like month or quarter will throw an exception.
- The accepted units for fixed intervals are:
- milliseconds (`ms`) ::
- A single millisecond. This is a very, very small interval.
- seconds (`s`) ::
- Defined as 1000 milliseconds each.
- minutes (`m`) ::
- Defined as 60 seconds each (60,000 milliseconds).
- All minutes begin at 00 seconds.
- hours (`h`) ::
- Defined as 60 minutes each (3,600,000 milliseconds).
- All hours begin at 00 minutes and 00 seconds.
- days (`d`) ::
- Defined as 24 hours (86,400,000 milliseconds).
- All days begin at the earliest possible time, which is usually 00:00:00
- (midnight).
- [[fixed_interval_examples]]
- ===== Fixed interval examples
- If we try to recreate the "month" `calendar_interval` from earlier, we can approximate that with
- 30 fixed days:
- [source,console,id=datehistogram-aggregation-fixed-interval-example]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs": {
- "sales_over_time": {
- "date_histogram": {
- "field": "date",
- "fixed_interval": "30d"
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:sales]
- But if we try to use a calendar unit that is not supported, such as weeks, we'll get an exception:
- [source,console,id=datehistogram-aggregation-fixed-interval-unsupported-example]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs": {
- "sales_over_time": {
- "date_histogram": {
- "field": "date",
- "fixed_interval": "2w"
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:sales]
- // TEST[catch:bad_request]
- [source,js]
- --------------------------------------------------
- {
- "error" : {
- "root_cause" : [...],
- "type" : "x_content_parse_exception",
- "reason" : "[1:82] [date_histogram] failed to parse field [fixed_interval]",
- "caused_by" : {
- "type" : "illegal_argument_exception",
- "reason" : "failed to parse setting [date_histogram.fixedInterval] with value [2w] as a time value: unit is missing or unrecognized",
- "stack_trace" : "java.lang.IllegalArgumentException: failed to parse setting [date_histogram.fixedInterval] with value [2w] as a time value: unit is missing or unrecognized"
- }
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- [[datehistogram-aggregation-notes]]
- ==== Date histogram usage notes
- In all cases, when the specified end time does not exist, the actual end time is
- the closest available time after the specified end.
- Widely distributed applications must also consider vagaries such as countries that
- start and stop daylight savings time at 12:01 A.M., so end up with one minute of
- Sunday followed by an additional 59 minutes of Saturday once a year, and countries
- that decide to move across the international date line. Situations like
- that can make irregular time zone offsets seem easy.
- As always, rigorous testing, especially around time-change events, will ensure
- that your time interval specification is
- what you intend it to be.
- WARNING: To avoid unexpected results, all connected servers and clients must
- sync to a reliable network time service.
- NOTE: Fractional time values are not supported, but you can address this by
- shifting to another time unit (e.g., `1.5h` could instead be specified as `90m`).
- NOTE: You can also specify time values using abbreviations supported by
- <<time-units,time units>> parsing.
- [[datehistogram-aggregation-keys]]
- ==== Keys
- Internally, a date is represented as a 64 bit number representing a timestamp
- in milliseconds-since-the-epoch (01/01/1970 midnight UTC). These timestamps are
- returned as the ++key++ name of the bucket. The `key_as_string` is the same
- timestamp converted to a formatted
- date string using the `format` parameter specification:
- TIP: If you don't specify `format`, the first date
- <<mapping-date-format,format>> specified in the field mapping is used.
- [source,console,id=datehistogram-aggregation-format-example]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs": {
- "sales_over_time": {
- "date_histogram": {
- "field": "date",
- "calendar_interval": "1M",
- "format": "yyyy-MM-dd" <1>
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:sales]
- <1> Supports expressive date <<date-format-pattern,format pattern>>
- Response:
- [source,console-result]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "sales_over_time": {
- "buckets": [
- {
- "key_as_string": "2015-01-01",
- "key": 1420070400000,
- "doc_count": 3
- },
- {
- "key_as_string": "2015-02-01",
- "key": 1422748800000,
- "doc_count": 2
- },
- {
- "key_as_string": "2015-03-01",
- "key": 1425168000000,
- "doc_count": 2
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- [[datehistogram-aggregation-time-zone]]
- ==== Time zone
- {es} stores date-times in Coordinated Universal Time (UTC). By default, all bucketing and
- rounding is also done in UTC. Use the `time_zone` parameter to indicate
- that bucketing should use a different time zone.
- For example, if the interval is a calendar day and the time zone is
- `America/New_York` then `2020-01-03T01:00:01Z` is :
- # Converted to `2020-01-02T18:00:01`
- # Rounded down to `2020-01-02T00:00:00`
- # Then converted back to UTC to produce `2020-01-02T05:00:00:00Z`
- # Finally, when the bucket is turned into a string key it is printed in
- `America/New_York` so it'll display as `"2020-01-02T00:00:00"`.
- It looks like:
- [source,java]
- ----
- bucket_key = localToUtc(Math.floor(utcToLocal(value) / interval) * interval))
- ----
- You can specify time zones as an ISO 8601 UTC offset (e.g. `+01:00` or
- `-08:00`) or as an IANA time zone ID,
- such as `America/Los_Angeles`.
- Consider the following example:
- [source,console,id=datehistogram-aggregation-timezone-example]
- ---------------------------------
- PUT my-index-000001/_doc/1?refresh
- {
- "date": "2015-10-01T00:30:00Z"
- }
- PUT my-index-000001/_doc/2?refresh
- {
- "date": "2015-10-01T01:30:00Z"
- }
- GET my-index-000001/_search?size=0
- {
- "aggs": {
- "by_day": {
- "date_histogram": {
- "field": "date",
- "calendar_interval": "day"
- }
- }
- }
- }
- ---------------------------------
- If you don't specify a time zone, UTC is used. This would result in both of these
- documents being placed into the same day bucket, which starts at midnight UTC
- on 1 October 2015:
- [source,console-result]
- ---------------------------------
- {
- ...
- "aggregations": {
- "by_day": {
- "buckets": [
- {
- "key_as_string": "2015-10-01T00:00:00.000Z",
- "key": 1443657600000,
- "doc_count": 2
- }
- ]
- }
- }
- }
- ---------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- If you specify a `time_zone` of `-01:00`, midnight in that time zone is one hour
- before midnight UTC:
- [source,console]
- ---------------------------------
- GET my-index-000001/_search?size=0
- {
- "aggs": {
- "by_day": {
- "date_histogram": {
- "field": "date",
- "calendar_interval": "day",
- "time_zone": "-01:00"
- }
- }
- }
- }
- ---------------------------------
- // TEST[continued]
- Now the first document falls into the bucket for 30 September 2015, while the
- second document falls into the bucket for 1 October 2015:
- [source,console-result]
- ---------------------------------
- {
- ...
- "aggregations": {
- "by_day": {
- "buckets": [
- {
- "key_as_string": "2015-09-30T00:00:00.000-01:00", <1>
- "key": 1443574800000,
- "doc_count": 1
- },
- {
- "key_as_string": "2015-10-01T00:00:00.000-01:00", <1>
- "key": 1443661200000,
- "doc_count": 1
- }
- ]
- }
- }
- }
- ---------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- <1> The `key_as_string` value represents midnight on each day
- in the specified time zone.
- WARNING: Many time zones shift their clocks for daylight savings time. Buckets
- close to the moment when those changes happen can have slightly different sizes
- than you would expect from the `calendar_interval` or `fixed_interval`.
- For example, consider a DST start in the `CET` time zone: on 27 March 2016 at 2am,
- clocks were turned forward 1 hour to 3am local time. If you use `day` as the
- `calendar_interval`, the bucket covering that day will only hold data for 23
- hours instead of the usual 24 hours for other buckets. The same is true for
- shorter intervals, like a `fixed_interval` of `12h`, where you'll have only a 11h
- bucket on the morning of 27 March when the DST shift happens.
- [[search-aggregations-bucket-datehistogram-offset]]
- ==== Offset
- // tag::offset-explanation[]
- Use the `offset` parameter to change the start value of each bucket by the
- specified positive (`+`) or negative offset (`-`) duration, such as `1h` for
- an hour, or `1d` for a day. See <<time-units>> for more possible time
- duration options.
- For example, when using an interval of `day`, each bucket runs from midnight
- to midnight. Setting the `offset` parameter to `+6h` changes each bucket
- to run from 6am to 6am:
- // end::offset-explanation[]
- [source,console,id=datehistogram-aggregation-offset-example]
- -----------------------------
- PUT my-index-000001/_doc/1?refresh
- {
- "date": "2015-10-01T05:30:00Z"
- }
- PUT my-index-000001/_doc/2?refresh
- {
- "date": "2015-10-01T06:30:00Z"
- }
- GET my-index-000001/_search?size=0
- {
- "aggs": {
- "by_day": {
- "date_histogram": {
- "field": "date",
- "calendar_interval": "day",
- "offset": "+6h"
- }
- }
- }
- }
- -----------------------------
- // tag::offset-result-intro[]
- Instead of a single bucket starting at midnight, the above request groups the
- documents into buckets starting at 6am:
- // end::offset-result-intro[]
- [source,console-result]
- -----------------------------
- {
- ...
- "aggregations": {
- "by_day": {
- "buckets": [
- {
- "key_as_string": "2015-09-30T06:00:00.000Z",
- "key": 1443592800000,
- "doc_count": 1
- },
- {
- "key_as_string": "2015-10-01T06:00:00.000Z",
- "key": 1443679200000,
- "doc_count": 1
- }
- ]
- }
- }
- }
- -----------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- // tag::offset-note[]
- NOTE: The start `offset` of each bucket is calculated after `time_zone`
- adjustments have been made.
- // end::offset-note[]
- [[date-histogram-keyed-response]]
- ==== Keyed Response
- Setting the `keyed` flag to `true` associates a unique string key with each
- bucket and returns the ranges as a hash rather than an array:
- [source,console,id=datehistogram-aggregation-keyed-example]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs": {
- "sales_over_time": {
- "date_histogram": {
- "field": "date",
- "calendar_interval": "1M",
- "format": "yyyy-MM-dd",
- "keyed": true
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:sales]
- Response:
- [source,console-result]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "sales_over_time": {
- "buckets": {
- "2015-01-01": {
- "key_as_string": "2015-01-01",
- "key": 1420070400000,
- "doc_count": 3
- },
- "2015-02-01": {
- "key_as_string": "2015-02-01",
- "key": 1422748800000,
- "doc_count": 2
- },
- "2015-03-01": {
- "key_as_string": "2015-03-01",
- "key": 1425168000000,
- "doc_count": 2
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- [[date-histogram-scripts]]
- ==== Scripts
- If the data in your documents doesn't exactly match what you'd like to aggregate,
- use a <<runtime,runtime field>> . For example, if the revenue
- for promoted sales should be recognized a day after the sale date:
- [source,console,id=datehistogram-aggregation-runtime-field]
- ----
- POST /sales/_search?size=0
- {
- "runtime_mappings": {
- "date.promoted_is_tomorrow": {
- "type": "date",
- "script": """
- long date = doc['date'].value.toInstant().toEpochMilli();
- if (doc['promoted'].value) {
- date += 86400;
- }
- emit(date);
- """
- }
- },
- "aggs": {
- "sales_over_time": {
- "date_histogram": {
- "field": "date.promoted_is_tomorrow",
- "calendar_interval": "1M"
- }
- }
- }
- }
- ----
- // TEST[setup:sales]
- ////
- [source,console-result]
- ----
- {
- ...
- "aggregations": {
- "sales_over_time": {
- "buckets": [
- {
- "key_as_string": "2015-01-01T00:00:00.000Z",
- "key": 1420070400000,
- "doc_count": 3
- },
- {
- "key_as_string": "2015-02-01T00:00:00.000Z",
- "key": 1422748800000,
- "doc_count": 2
- },
- {
- "key_as_string": "2015-03-01T00:00:00.000Z",
- "key": 1425168000000,
- "doc_count": 2
- }
- ]
- }
- }
- }
- ----
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- ////
- [[date-histogram-params]]
- ==== Parameters
- You can control the order of the returned
- buckets using the `order`
- settings and filter the returned buckets based on a `min_doc_count` setting
- (by default all buckets between the first
- bucket that matches documents and the last one are returned). This histogram
- also supports the `extended_bounds`
- setting, which enables extending the bounds of the histogram beyond the data
- itself, and `hard_bounds` that limits the histogram to specified bounds.
- For more information, see
- <<search-aggregations-bucket-histogram-aggregation-extended-bounds,`Extended Bounds`>> and
- <<search-aggregations-bucket-histogram-aggregation-hard-bounds,`Hard Bounds`>>.
- [[date-histogram-missing-value]]
- ===== Missing value
- The `missing` parameter defines how to treat documents that are missing a value.
- By default, they are ignored, but it is also possible to treat them as if they
- have a value.
- [source,console,id=datehistogram-aggregation-missing-example]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs": {
- "sale_date": {
- "date_histogram": {
- "field": "date",
- "calendar_interval": "year",
- "missing": "2000/01/01" <1>
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:sales]
- <1> Documents without a value in the `publish_date` field will fall into the
- same bucket as documents that have the value `2000-01-01`.
- [[date-histogram-order]]
- ===== Order
- By default the returned buckets are sorted by their `key` ascending, but you can
- control the order using
- the `order` setting. This setting supports the same `order` functionality as
- <<search-aggregations-bucket-terms-aggregation-order,`Terms Aggregation`>>.
- [[date-histogram-aggregate-scripts]]
- ===== Using a script to aggregate by day of the week
- When you need to aggregate the results by day of the week, run a `terms`
- aggregation on a <<runtime,runtime field>> that returns the day of the week:
- [source,console,id=datehistogram-aggregation-day-of-week-runtime-field]
- ----
- POST /sales/_search?size=0
- {
- "runtime_mappings": {
- "date.day_of_week": {
- "type": "keyword",
- "script": "emit(doc['date'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
- }
- },
- "aggs": {
- "day_of_week": {
- "terms": { "field": "date.day_of_week" }
- }
- }
- }
- ----
- // TEST[setup:sales]
- Response:
- [source,console-result]
- ----
- {
- ...
- "aggregations": {
- "day_of_week": {
- "doc_count_error_upper_bound": 0,
- "sum_other_doc_count": 0,
- "buckets": [
- {
- "key": "Sunday",
- "doc_count": 4
- },
- {
- "key": "Thursday",
- "doc_count": 3
- }
- ]
- }
- }
- }
- ----
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- The response will contain all the buckets having the relative day of
- the week as key : 1 for Monday, 2 for Tuesday... 7 for Sunday.
|