| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190 | [[search-aggregations-bucket-range-field-note]]=== Subtleties of bucketing range fields==== Documents are counted for each bucket they land inSince a range represents multiple values, running a bucket aggregation over arange field can result in the same document landing in multiple buckets. Thiscan lead to surprising behavior, such as the sum of bucket counts being higherthan the number of matched documents. For example, consider the followingindex: [source, console]--------------------------------------------------PUT range_index{  "settings": {    "number_of_shards": 2  },  "mappings": {    "properties": {      "expected_attendees": {        "type": "integer_range"      },      "time_frame": {        "type": "date_range",        "format": "yyyy-MM-dd||epoch_millis"      }    }  }}PUT range_index/_doc/1?refresh{  "expected_attendees" : {    "gte" : 10,    "lte" : 20  },  "time_frame" : {    "gte" : "2019-10-28",    "lte" : "2019-11-04"  }}--------------------------------------------------// TESTSETUPThe range is wider than the interval in the following aggregation, and thus thedocument will land in multiple buckets.[source, console,id=range-field-aggregation-example]--------------------------------------------------POST /range_index/_search?size=0{  "aggs": {    "range_histo": {      "histogram": {        "field": "expected_attendees",        "interval": 5      }    }  }}--------------------------------------------------Since the interval is `5` (and the offset is `0` by default), we expect buckets `10`,`15`, and `20`. Our range document will fall in all three of these buckets.[source, console-result]--------------------------------------------------{  ...  "aggregations" : {    "range_histo" : {      "buckets" : [        {          "key" : 10.0,          "doc_count" : 1        },        {          "key" : 15.0,          "doc_count" : 1        },        {          "key" : 20.0,          "doc_count" : 1        }      ]    }  }}--------------------------------------------------// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]A document cannot exist partially in a bucket; For example, the above documentcannot count as one-third in each of the above three buckets. In this example,since the document's range landed in multiple buckets, the full value of thatdocument would also be counted in any sub-aggregations for each bucket as well.==== Query bounds are not aggregation filtersAnother unexpected behavior can arise when a query is used to filter on thefield being aggregated. In this case, a document could match the query butstill have one or both of the endpoints of the range outside the query.Consider the following aggregation on the above document:[source, console,id=range-field-aggregation-query-bounds-example]--------------------------------------------------POST /range_index/_search?size=0{  "query": {    "range": {      "time_frame": {        "gte": "2019-11-01",        "format": "yyyy-MM-dd"      }    }  },  "aggs": {    "november_data": {      "date_histogram": {        "field": "time_frame",        "calendar_interval": "day",        "format": "yyyy-MM-dd"      }    }  }}--------------------------------------------------Even though the query only considers days in November, the aggregationgenerates 8 buckets (4 in October, 4 in November) because the aggregation iscalculated over the ranges of all matching documents.[source, console-result]--------------------------------------------------{  ...  "aggregations" : {    "november_data" : {      "buckets" : [              {          "key_as_string" : "2019-10-28",          "key" : 1572220800000,          "doc_count" : 1        },        {          "key_as_string" : "2019-10-29",          "key" : 1572307200000,          "doc_count" : 1        },        {          "key_as_string" : "2019-10-30",          "key" : 1572393600000,          "doc_count" : 1        },        {          "key_as_string" : "2019-10-31",          "key" : 1572480000000,          "doc_count" : 1        },        {          "key_as_string" : "2019-11-01",          "key" : 1572566400000,          "doc_count" : 1        },        {          "key_as_string" : "2019-11-02",          "key" : 1572652800000,          "doc_count" : 1        },        {          "key_as_string" : "2019-11-03",          "key" : 1572739200000,          "doc_count" : 1        },        {          "key_as_string" : "2019-11-04",          "key" : 1572825600000,          "doc_count" : 1        }      ]    }  }}--------------------------------------------------// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]Depending on the use case, a `CONTAINS` query could limit the documents to onlythose that fall entirely in the queried range. In this example, the onedocument would not be included and the aggregation would be empty. Filteringthe buckets after the aggregation is also an option, for use cases where thedocument should be counted but the out of bounds data can be safely ignored.
 |