7 years ago · 1ce3c92a2d
--- a/docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc
+++ b/docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc
@@ -1,12 +1,129 @@
 
				 [[search-aggregations-bucket-datehistogram-aggregation]]
			
 
				 === Date Histogram Aggregation
			
 
				 
			
 
				-A multi-bucket aggregation similar to the <<search-aggregations-bucket-histogram-aggregation,histogram>> except it can
			
 
				-only be applied on date values. Since dates are represented in Elasticsearch internally as long values, it is possible
			
 
				-to use the normal `histogram` on dates as well, though accuracy will be compromised. The reason for this is in the fact
			
 
				-that time based intervals are not fixed (think of leap years and on the number of days in a month). For this reason,
			
 
				-we need special support for time based data. From a functionality perspective, this histogram supports the same features
			
 
				-as the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>. The main difference is that the interval can be specified by date/time expressions.
			
 
				+This multi-bucket aggregation is similar to the normal
			
 
				+<<search-aggregations-bucket-histogram-aggregation,histogram>>, but it can
			
 
				+only be used with date values. Because dates are represented internally in 
			
 
				+Elasticsearch as long values, it is possible, but not as accurate, to use the
			
 
				+normal `histogram` on dates as well. The main difference in the two APIs is
			
 
				+that here the interval can be specified using date/time expressions. Time-based
			
 
				+data requires special support because time-based intervals are not always a
			
 
				+fixed length.
			
 
				+
			
 
				+==== Setting intervals
			
 
				+
			
 
				+There seems to be no limit to the creativity we humans apply to setting our
			
 
				+clocks and calendars. We've invented leap years and leap seconds, standard and
			
 
				+daylight savings times, and timezone offsets of 30 or 45 minutes rather than a 
			
 
				+full hour. While these creations help keep us in sync with the cosmos and our
			
 
				+environment, they can make specifying time intervals accurately a real challenge.
			
 
				+The only universal truth our researchers have yet to disprove is that a
			
 
				+millisecond is always the same duration, and a second is always 1000 milliseconds.
			
 
				+Beyond that, things get complicated.
			
 
				+
			
 
				+Generally speaking, when you specify a single time unit, such as 1 hour or 1 day, you
			
 
				+are working with a _calendar interval_, but multiples, such as 6 hours or 3 days, are
			
 
				+_fixed-length intervals_.
			
 
				+
			
 
				+For example, a specification of 1 day (1d) from now is a calendar interval that
			
 
				+means "at
			
 
				+this exact time tomorrow" no matter the length of the day. A change to or from 
			
 
				+daylight savings time that results in a 23 or 25 hour day is compensated for and the
			
 
				+specification of "this exact time tomorrow" is maintained. But if you specify 2 or
			
 
				+more days, each day must be of the same fixed duration (24 hours). In this case, if
			
 
				+the specified interval includes the change to or from daylight savings time, the
			
 
				+interval will end an hour sooner or later than you expect.
			
 
				+
			
 
				+There are similar differences to consider when you specify single versus multiple
			
 
				+minutes or hours. Multiple time periods longer than a day are not supported.
			
 
				+
			
 
				+Here are the valid time specifications and their meanings:
			
 
				+
			
 
				+milliseconds (ms) ::
			
 
				+Fixed length interval; supports multiples.
			
 
				+
			
 
				+seconds (s) ::
			
 
				+1000 milliseconds; fixed length interval (except for the last second of a
			
 
				+minute that contains a leap-second, which is 2000ms long); supports multiples.
			
 
				+
			
 
				+minutes (m) ::
			
 
				+All minutes begin at 00 seconds.
			
 
				+
			
 
				+* One minute (1m) is the interval between 00 seconds of the first minute and 00
			
 
				+seconds of the following minute in the specified timezone, compensating for any
			
 
				+intervening leap seconds, so that the number of minutes and seconds past the 
			
 
				+hour is the same at the start and end. 
			
 
				+* Multiple minutes (__n__m) are intervals of exactly 60x1000=60,000 milliseconds
			
 
				+each.
			
 
				+
			
 
				+hours (h) ::
			
 
				+All hours begin at 00 minutes and 00 seconds.
			
 
				+
			
 
				+* One hour (1h) is the interval between 00:00 minutes of the first hour and 00:00
			
 
				+minutes of the following hour in the specified timezone, compensating for any
			
 
				+intervening leap seconds, so that the number of minutes and seconds past the hour
			
 
				+is the same at the start and end. 
			
 
				+* Multiple hours (__n__h) are intervals of exactly 60x60x1000=3,600,000 milliseconds
			
 
				+each.
			
 
				+
			
 
				+days (d) ::
			
 
				+All days begin at the earliest possible time, which is usually 00:00:00
			
 
				+(midnight).
			
 
				+
			
 
				+* One day (1d) is the interval between the start of the day and the start of
			
 
				+of the following day in the specified timezone, compensating for any intervening
			
 
				+time changes.
			
 
				+* Multiple days (__n__d) are intervals of exactly 24x60x60x1000=86,400,000
			
 
				+milliseconds each.
			
 
				+
			
 
				+weeks (w) ::
			
 
				+
			
 
				+* One week (1w) is the interval between the start day_of_week:hour:minute:second
			
 
				+and the same day of the week and time of the following week in the specified 
			
 
				+timezone.
			
 
				+* Multiple weeks (__n__w) are not supported.
			
 
				+
			
 
				+months (M) ::
			
 
				+
			
 
				+* One month (1M) is the interval between the start day of the month and time of
			
 
				+day and the same day of the month and time of the following month in the specified
			
 
				+timezone, so that the day of the month and time of day are the same at the start
			
 
				+and end.
			
 
				+* Multiple months (__n__M) are not supported.
			
 
				+
			
 
				+quarters (q) ::
			
 
				+
			
 
				+* One quarter (1q) is the interval between the start day of the month and
			
 
				+time of day and the same day of the month and time of day three months later,
			
 
				+so that the day of the month and time of day are the same at the start and end. +
			
 
				+* Multiple quarters (__n__q) are not supported.
			
 
				+
			
 
				+years (y) ::
			
 
				+
			
 
				+* One year (1y) is the interval between the start day of the month and time of 
			
 
				+day and the same day of the month and time of day the following year in the 
			
 
				+specified timezone, so that the date and time are the same at the start and end. +
			
 
				+* Multiple years (__n__y) are not supported.
			
 
				+
			
 
				+NOTE:
			
 
				+In all cases, when the specified end time does not exist, the actual end time is
			
 
				+the closest available time after the specified end.
			
 
				+
			
 
				+Widely distributed applications must also consider vagaries such as countries that
			
 
				+start and stop daylight savings time at 12:01 A.M., so end up with one minute of 
			
 
				+Sunday followed by an additional 59 minutes of Saturday once a year, and countries
			
 
				+that decide to move across the international date line. Situations like
			
 
				+that can make irregular timezone offsets seem easy. 
			
 
				+
			
 
				+As always, rigorous testing, especially around time-change events, will ensure
			
 
				+that your time interval specification is
			
 
				+what you intend it to be.
			
 
				+
			
 
				+WARNING:
			
 
				+To avoid unexpected results, all connected servers and clients must sync to a
			
 
				+reliable network time service.
			
 
				+
			
 
				+==== Examples
			
 
				 
			
 
				 Requesting bucket intervals of a month.
			
 
				 
			
@@ -27,13 +144,11 @@ POST /sales/_search?size=0
 
				 // CONSOLE
			
 
				 // TEST[setup:sales]
			
 
				 
			
 
				-Available expressions for interval: `year` (`1y`), `quarter` (`1q`), `month` (`1M`), `week` (`1w`),
			
 
				-`day` (`1d`), `hour` (`1h`), `minute` (`1m`), `second` (`1s`)
			
 
				-
			
 
				-Time values can also be specified via abbreviations supported by <<time-units,time units>> parsing.
			
 
				-Note that fractional time values are not supported, but you can address this by shifting to another
			
 
				-time unit (e.g., `1.5h` could instead be specified as `90m`). Also note that time intervals larger than
			
 
				-days do not support arbitrary values but can only be one unit large (e.g. `1y` is valid, `2y` is not).
			
 
				+You can also specify time values using abbreviations supported by
			
 
				+<<time-units,time units>> parsing.
			
 
				+Note that fractional time values are not supported, but you can address this by
			
 
				+shifting to another
			
 
				+time unit (e.g., `1.5h` could instead be specified as `90m`). 
			
 
				 
			
 
				 [source,js]
			
 
				 --------------------------------------------------
			
@@ -52,15 +167,16 @@ POST /sales/_search?size=0
 
				 // CONSOLE
			
 
				 // TEST[setup:sales]
			
 
				 
			
 
				-==== Keys
			
 
				+===== Keys
			
 
				 
			
 
				 Internally, a date is represented as a 64 bit number representing a timestamp
			
 
				-in milliseconds-since-the-epoch. These timestamps are returned as the bucket
			
 
				-++key++s. The `key_as_string` is the same timestamp converted to a formatted
			
 
				-date string using the format specified with the `format` parameter:
			
 
				+in milliseconds-since-the-epoch (01/01/1970 midnight UTC). These timestamps are
			
 
				+returned as the ++key++ name of the bucket. The `key_as_string` is the same
			
 
				+timestamp converted to a formatted
			
 
				+date string using the `format` parameter sprcification:
			
 
				 
			
 
				-TIP: If no `format` is specified, then it will use the first date
			
 
				-<<mapping-date-format,format>> specified in the field mapping.
			
 
				+TIP: If you don't specify `format`, the first date
			
 
				+<<mapping-date-format,format>> specified in the field mapping is used.
			
 
				 
			
 
				 [source,js]
			
 
				 --------------------------------------------------
			
@@ -113,15 +229,15 @@ Response:
 
				 --------------------------------------------------
			
 
				 // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
			
 
				 
			
 
				-==== Time Zone
			
 
				+===== Timezone
			
 
				 
			
 
				 Date-times are stored in Elasticsearch in UTC.  By default, all bucketing and
			
 
				-rounding is also done in UTC. The `time_zone` parameter can be used to indicate
			
 
				-that bucketing should use a different time zone.
			
 
				+rounding is also done in UTC. Use the `time_zone` parameter to indicate
			
 
				+that bucketing should use a different timezone.
			
 
				 
			
 
				-Time zones may either be specified as an ISO 8601 UTC offset (e.g. `+01:00` or
			
 
				-`-08:00`)  or as a timezone id, an identifier used in the TZ database like
			
 
				-`America/Los_Angeles`.
			
 
				+You can specify timezones as either an ISO 8601 UTC offset (e.g. `+01:00` or
			
 
				+`-08:00`)  or as a timezone ID as specified in the IANA timezone database,
			
 
				+such as`America/Los_Angeles`.
			
 
				 
			
 
				 Consider the following example:
			
 
				 
			
@@ -151,7 +267,7 @@ GET my_index/_search?size=0
 
				 ---------------------------------
			
 
				 // CONSOLE
			
 
				 
			
 
				-UTC is used if no time zone is specified, which would result in both of these
			
 
				+If you don't specify a timezone, UTC is used. This would result in both of these
			
 
				 documents being placed into the same day bucket, which starts at midnight UTC
			
 
				 on 1 October 2015:
			
 
				 
			
@@ -174,8 +290,8 @@ on 1 October 2015:
 
				 ---------------------------------
			
 
				 // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
			
 
				 
			
 
				-If a `time_zone` of `-01:00` is specified, then midnight starts at one hour before
			
 
				-midnight UTC:
			
 
				+If you specify a `time_zone` of `-01:00`, midnight in that timezone is one hour
			
 
				+before midnight UTC:
			
 
				 
			
 
				 [source,js]
			
 
				 ---------------------------------
			
@@ -223,28 +339,27 @@ second document falls into the bucket for 1 October 2015:
 
				 // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
			
 
				 
			
 
				 <1> The `key_as_string` value represents midnight on each day
			
 
				-    in the specified time zone.
			
 
				+    in the specified timezone.
			
 
				 
			
 
				 WARNING: When using time zones that follow DST (daylight savings time) changes,
			
 
				 buckets close to the moment when those changes happen can have slightly different
			
 
				-sizes than would be expected from the used `interval`.
			
 
				+sizes than you would expect from the used `interval`.
			
 
				 For example, consider a DST start in the `CET` time zone: on 27 March 2016 at 2am,
			
 
				-clocks were turned forward 1 hour to 3am local time. When using `day` as `interval`,
			
 
				+clocks were turned forward 1 hour to 3am local time. If you use `day` as `interval`,
			
 
				 the bucket covering that day will only hold data for 23 hours instead of the usual
			
 
				-24 hours for other buckets. The same is true for shorter intervals like e.g. 12h.
			
 
				-Here, we will have only a 11h bucket on the morning of 27 March when the DST shift
			
 
				+24 hours for other buckets. The same is true for shorter intervals, like 12h,
			
 
				+where you'll have only a 11h bucket on the morning of 27 March when the DST shift
			
 
				 happens.
			
 
				 
			
 
				+===== Offset
			
 
				 
			
 
				-==== Offset
			
 
				-
			
 
				-The `offset` parameter is used to change the start value of each bucket by the
			
 
				+Use the `offset` parameter to change the start value of each bucket by the
			
 
				 specified positive (`+`) or negative offset (`-`) duration, such as `1h` for
			
 
				 an hour, or `1d` for a day. See <<time-units>> for more possible time
			
 
				 duration options.
			
 
				 
			
 
				-For instance, when using an interval of `day`, each bucket runs from midnight
			
 
				-to midnight.  Setting the `offset` parameter to `+6h` would change each bucket
			
 
				+For example, when using an interval of `day`, each bucket runs from midnight
			
 
				+to midnight.  Setting the `offset` parameter to `+6h` changes each bucket
			
 
				 to run from 6am to 6am:
			
 
				 
			
 
				 [source,js]
			
@@ -301,12 +416,13 @@ documents into buckets starting at 6am:
 
				 -----------------------------
			
 
				 // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
			
 
				 
			
 
				-NOTE: The start `offset` of each bucket is calculated after the `time_zone`
			
 
				+NOTE: The start `offset` of each bucket is calculated after `time_zone`
			
 
				 adjustments have been made.
			
 
				 
			
 
				-==== Keyed Response
			
 
				+===== Keyed Response
			
 
				 
			
 
				-Setting the `keyed` flag to `true` will associate a unique string key with each bucket and return the ranges as a hash rather than an array:
			
 
				+Setting the `keyed` flag to `true` associates a unique string key with each
			
 
				+bucket and returns the ranges as a hash rather than an array:
			
 
				 
			
 
				 [source,js]
			
 
				 --------------------------------------------------
			
@@ -358,20 +474,25 @@ Response:
 
				 --------------------------------------------------
			
 
				 // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
			
 
				 
			
 
				-==== Scripts
			
 
				+===== Scripts
			
 
				 
			
 
				-Like with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>, both document level scripts and
			
 
				-value level scripts are supported. It is also possible to control the order of the returned buckets using the `order`
			
 
				-settings and filter the returned buckets based on a `min_doc_count` setting (by default all buckets between the first
			
 
				-bucket that matches documents and the last one are returned). This histogram also supports the `extended_bounds`
			
 
				-setting, which enables extending the bounds of the histogram beyond the data itself (to read more on why you'd want to
			
 
				-do that please refer to the explanation <<search-aggregations-bucket-histogram-aggregation-extended-bounds,here>>).
			
 
				+As with the normal <<search-aggregations-bucket-histogram-aggregation,histogram>>,
			
 
				+both document-level scripts and
			
 
				+value-level scripts are supported. You can control the order of the returned
			
 
				+buckets using the `order`
			
 
				+settings and filter the returned buckets based on a `min_doc_count` setting
			
 
				+(by default all buckets between the first
			
 
				+bucket that matches documents and the last one are returned). This histogram
			
 
				+also supports the `extended_bounds`
			
 
				+setting, which enables extending the bounds of the histogram beyond the data
			
 
				+itself. For more information, see
			
 
				+<<search-aggregations-bucket-histogram-aggregation-extended-bounds,`Extended Bounds`>>.
			
 
				 
			
 
				-==== Missing value
			
 
				+===== Missing value
			
 
				 
			
 
				-The `missing` parameter defines how documents that are missing a value should be treated.
			
 
				-By default they will be ignored but it is also possible to treat them as if they
			
 
				-had a value.
			
 
				+The `missing` parameter defines how to treat documents that are missing a value.
			
 
				+By default, they are ignored, but it is also possible to treat them as if they
			
 
				+have a value.
			
 
				 
			
 
				 [source,js]
			
 
				 --------------------------------------------------
			
@@ -391,20 +512,22 @@ POST /sales/_search?size=0
 
				 // CONSOLE
			
 
				 // TEST[setup:sales]
			
 
				 
			
 
				-<1> Documents without a value in the `publish_date` field will fall into the same bucket as documents that have the value `2000-01-01`.
			
 
				+<1> Documents without a value in the `publish_date` field will fall into the
			
 
				+same bucket as documents that have the value `2000-01-01`.
			
 
				 
			
 
				-==== Order
			
 
				+===== Order
			
 
				 
			
 
				-By default the returned buckets are sorted by their `key` ascending, though the order behaviour can be controlled using
			
 
				-the `order` setting. Supports the same `order` functionality as the <<search-aggregations-bucket-terms-aggregation-order,`Terms Aggregation`>>.
			
 
				+By default the returned buckets are sorted by their `key` ascending, but you can
			
 
				+control the order using
			
 
				+the `order` setting. This setting supports the same `order` functionality as
			
 
				+<<search-aggregations-bucket-terms-aggregation-order,`Terms Aggregation`>>.
			
 
				 
			
 
				 deprecated[6.0.0, Use `_key` instead of `_time` to order buckets by their dates/keys]
			
 
				 
			
 
				-==== Use of a script to aggregate by day of the week
			
 
				+===== Using a script to aggregate by day of the week
			
 
				 
			
 
				-There are some cases where date histogram can't help us, like for example, when we need
			
 
				-to aggregate the results by day of the week.
			
 
				-In this case to overcome the problem, we can use a script that returns the day of the week:
			
 
				+When you need to aggregate the results by day of the week, use a script that 
			
 
				+returns the day of the week:
			
 
				 
			
 
				 
			
 
				 [source,js]
			
@@ -452,5 +575,5 @@ Response:
 
				 --------------------------------------------------
			
 
				 // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
			
 
				 
			
 
				-The response will contain all the buckets having as key the relative day of
			
 
				-the week: 1 for Monday, 2 for Tuesday... 7 for Sunday.
			
 
				+The response will contain all the buckets having the relative day of
			
 
				+the week as key : 1 for Monday, 2 for Tuesday... 7 for Sunday.