6 years ago · 57a679fbbb
--- a/docs/reference/aggregations/bucket.asciidoc
+++ b/docs/reference/aggregations/bucket.asciidoc
@@ -67,3 +67,4 @@ include::bucket/significanttext-aggregation.asciidoc[]
 
				 
			
 
				 include::bucket/terms-aggregation.asciidoc[]
			
 
				 
			
 
				+include::bucket/range-field-note.asciidoc[]
			
--- a/docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc
+++ b/docs/reference/aggregations/bucket/datehistogram-aggregation.asciidoc
@@ -3,7 +3,7 @@
 
				 
			
 
				 This multi-bucket aggregation is similar to the normal
			
 
				 <<search-aggregations-bucket-histogram-aggregation,histogram>>, but it can
			
 
				-only be used with date values. Because dates are represented internally in 
			
 
				+only be used with date or date range values. Because dates are represented internally in 
			
 
				 Elasticsearch as long values, it is possible, but not as accurate, to use the
			
 
				 normal `histogram` on dates as well. The main difference in the two APIs is
			
 
				 that here the interval can be specified using date/time expressions. Time-based
			
--- a/docs/reference/aggregations/bucket/histogram-aggregation.asciidoc
+++ b/docs/reference/aggregations/bucket/histogram-aggregation.asciidoc
@@ -1,12 +1,13 @@
 
				 [[search-aggregations-bucket-histogram-aggregation]]
			
 
				 === Histogram Aggregation
			
 
				 
			
 
				-A multi-bucket values source based aggregation that can be applied on numeric values extracted from the documents.
			
 
				-It dynamically builds fixed size (a.k.a. interval) buckets over the values. For example, if the documents have a field
			
 
				-that holds a price (numeric), we can configure this aggregation to dynamically build buckets with interval `5`
			
 
				-(in case of price it may represent $5). When the aggregation executes, the price field of every document will be
			
 
				-evaluated and will be rounded down to its closest bucket - for example, if the price is `32` and the bucket size is `5`
			
 
				-then the rounding will yield `30` and thus the document will "fall" into the bucket that is associated with the key `30`.
			
 
				+A multi-bucket values source based aggregation that can be applied on numeric values or numeric range values extracted
			
 
				+from the documents. It dynamically builds fixed size (a.k.a. interval) buckets over the values. For example, if the
			
 
				+documents have a field that holds a price (numeric), we can configure this aggregation to dynamically build buckets with
			
 
				+interval `5` (in case of price it may represent $5). When the aggregation executes, the price field of every document
			
 
				+will be evaluated and will be rounded down to its closest bucket - for example, if the price is `32` and the bucket size
			
 
				+is `5` then the rounding will yield `30` and thus the document will "fall" into the bucket that is associated with the
			
 
				+key `30`.
			
 
				 To make this more formal, here is the rounding function that is used:
			
 
				 
			
 
				 [source,java]
			
@@ -14,6 +15,10 @@ To make this more formal, here is the rounding function that is used:
 
				 bucket_key = Math.floor((value - offset) / interval) * interval + offset
			
 
				 --------------------------------------------------
			
 
				 
			
 
				+For range values, a document can fall into multiple buckets. The first bucket is computed from the lower
			
 
				+bound of the range in the same way as a bucket for a single value is computed.  The final bucket is computed in the same
			
 
				+way from the upper bound of the range, and the range is counted in all buckets in between and including those two.
			
 
				+
			
 
				 The `interval` must be a positive decimal, while the `offset` must be a decimal in `[0, interval)`
			
 
				 (a decimal greater than or equal to `0` and less than `interval`)
			
 
				 
			
@@ -175,6 +180,14 @@ POST /sales/_search?size=0
 
				 --------------------------------------------------
			
 
				 // TEST[setup:sales]
			
 
				 
			
 
				+When aggregating ranges, buckets are based on the values of the returned documents.  This means the response may include
			
 
				+buckets outside of a query's range. For example, if your query looks for values greater than 100, and you have a range
			
 
				+covering 50 to 150, and an interval of 50, that document will land in 3 buckets - 50, 100, and 150. In general, it's
			
 
				+best to think of the query and aggregation steps as independent - the query selects a set of documents, and then the
			
 
				+aggregation buckets those documents without regard to how they were selected.
			
 
				+See <<search-aggregations-bucket-range-field-note,note on bucketing range
			
 
				+fields>> for more information and an example.
			
 
				+
			
 
				 ==== Order
			
 
				 
			
 
				 By default the returned buckets are sorted by their `key` ascending, though the order behaviour can be controlled using
			
--- a/docs/reference/aggregations/bucket/range-field-note.asciidoc
+++ b/docs/reference/aggregations/bucket/range-field-note.asciidoc
@@ -0,0 +1,181 @@
 
				+[[search-aggregations-bucket-range-field-note]]
			
 
				+=== Subtleties of bucketing range fields
			
 
				+
			
 
				+==== Documents are counted for each bucket they land in
			
 
				+
			
 
				+Since a range represents multiple values, running a bucket aggregation over a
			
 
				+range field can result in the same document landing in multiple buckets. This
			
 
				+can lead to surprising behavior, such as the sum of bucket counts being higher
			
 
				+than the number of matched documents.  For example, consider the following
			
 
				+index: 
			
 
				+[source, console]
			
 
				+--------------------------------------------------
			
 
				+PUT range_index
			
 
				+{
			
 
				+  "settings": {
			
 
				+    "number_of_shards": 2
			
 
				+  },
			
 
				+  "mappings": {
			
 
				+    "properties": {
			
 
				+      "expected_attendees": {
			
 
				+        "type": "integer_range"
			
 
				+      },
			
 
				+      "time_frame": {
			
 
				+        "type": "date_range",
			
 
				+        "format": "yyyy-MM-dd||epoch_millis"
			
 
				+      }
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+
			
 
				+PUT range_index/_doc/1?refresh
			
 
				+{
			
 
				+  "expected_attendees" : {
			
 
				+    "gte" : 10,
			
 
				+    "lte" : 20
			
 
				+  },
			
 
				+  "time_frame" : {
			
 
				+    "gte" : "2019-10-28",
			
 
				+    "lte" : "2019-11-04"
			
 
				+  }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+// TESTSETUP
			
 
				+
			
 
				+The range is wider than the interval in the following aggregation, and thus the
			
 
				+document will land in multiple buckets.
			
 
				+
			
 
				+[source, console]
			
 
				+--------------------------------------------------
			
 
				+POST /range_index/_search?size=0
			
 
				+{
			
 
				+    "aggs" : {
			
 
				+        "range_histo" : {
			
 
				+            "histogram" : {
			
 
				+                "field" : "expected_attendees",
			
 
				+                "interval" : 5
			
 
				+            }
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+
			
 
				+Since the interval is `5` (and the offset is `0` by default), we expect buckets `10`,
			
 
				+`15`, and `20`. Our range document will fall in all three of these buckets.
			
 
				+
			
 
				+[source, console-result]
			
 
				+--------------------------------------------------
			
 
				+{
			
 
				+  ...
			
 
				+  "aggregations" : {
			
 
				+    "range_histo" : {
			
 
				+      "buckets" : [
			
 
				+        {
			
 
				+          "key" : 10.0,
			
 
				+          "doc_count" : 1
			
 
				+        },
			
 
				+        {
			
 
				+          "key" : 15.0,
			
 
				+          "doc_count" : 1
			
 
				+        },
			
 
				+        {
			
 
				+          "key" : 20.0,
			
 
				+          "doc_count" : 1
			
 
				+        }
			
 
				+      ]
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
			
 
				+
			
 
				+A document cannot exist partially in a bucket; For example, the above document
			
 
				+cannot count as one-third in each of the above three buckets. In this example,
			
 
				+since the document's range landed in multiple buckets, the full value of that
			
 
				+document would also be counted in any sub-aggregations for each bucket as well.
			
 
				+
			
 
				+==== Query bounds are not aggregation filters
			
 
				+
			
 
				+Another unexpected behavior can arise when a query is used to filter on the
			
 
				+field being aggregated. In this case, a document could match the query but
			
 
				+still have one or both of the endpoints of the range outside the query.
			
 
				+Consider the following aggregation on the above document:
			
 
				+
			
 
				+[source, console]
			
 
				+--------------------------------------------------
			
 
				+POST /range_index/_search?size=0
			
 
				+{
			
 
				+    "query": {
			
 
				+      "range": {
			
 
				+        "time_frame": {
			
 
				+          "gte": "2019-11-01",
			
 
				+          "format": "yyyy-MM-dd"
			
 
				+        }
			
 
				+      }
			
 
				+    }, 
			
 
				+    "aggs" : {
			
 
				+        "november_data" : {
			
 
				+            "date_histogram" : {
			
 
				+                "field" : "time_frame",
			
 
				+                "calendar_interval" : "day"
			
 
				+              }
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+
			
 
				+Even though the query only considers days in November, the aggregation
			
 
				+generates 8 buckets (4 in October, 4 in November) because the aggregation is
			
 
				+calculated over the ranges of all matching documents.
			
 
				+
			
 
				+[source, console-result]
			
 
				+--------------------------------------------------
			
 
				+{
			
 
				+  ...
			
 
				+  "aggregations" : {
			
 
				+    "november_data" : {
			
 
				+      "buckets" : [
			
 
				+        {
			
 
				+          "key" : 1572220800000,
			
 
				+          "doc_count" : 1
			
 
				+        },
			
 
				+        {
			
 
				+          "key" : 1572307200000,
			
 
				+          "doc_count" : 1
			
 
				+        },
			
 
				+        {
			
 
				+          "key" : 1572393600000,
			
 
				+          "doc_count" : 1
			
 
				+        },
			
 
				+        {
			
 
				+          "key" : 1572480000000,
			
 
				+          "doc_count" : 1
			
 
				+        },
			
 
				+        {
			
 
				+          "key" : 1572566400000,
			
 
				+          "doc_count" : 1
			
 
				+        },
			
 
				+        {
			
 
				+          "key" : 1572652800000,
			
 
				+          "doc_count" : 1
			
 
				+        },
			
 
				+        {
			
 
				+          "key" : 1572739200000,
			
 
				+          "doc_count" : 1
			
 
				+        },
			
 
				+        {
			
 
				+          "key" : 1572825600000,
			
 
				+          "doc_count" : 1
			
 
				+        }
			
 
				+      ]
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+--------------------------------------------------
			
 
				+// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
			
 
				+
			
 
				+Depending on the use case, a `CONTAINS` query could limit the documents to only
			
 
				+those that fall entirely in the queried range.  In this example, the one
			
 
				+document would not be included and the aggregation would be empty.  Filtering
			
 
				+the buckets after the aggregation is also an option, for use cases where the
			
 
				+document should be counted but the out of bounds data can be safely ignored.