123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575 |
- [[search-aggregations-bucket-composite-aggregation]]
- === Composite Aggregation
- beta[]
- A multi-bucket aggregation that creates composite buckets from different sources.
- Unlike the other `multi-bucket` aggregation the `composite` aggregation can be used
- to paginate **all** buckets from a multi-level aggregation efficiently. This aggregation
- provides a way to stream **all** buckets of a specific aggregation similarly to what
- <<search-request-scroll, scroll>> does for documents.
- The composite buckets are built from the combinations of the
- values extracted/created for each document and each combination is considered as
- a composite bucket.
- //////////////////////////
- [source,js]
- --------------------------------------------------
- PUT /sales
- {
- "mappings": {
- "docs": {
- "properties": {
- "product": {
- "type": "keyword"
- },
- "timestamp": {
- "type": "date"
- },
- "price": {
- "type": "long"
- },
- "shop": {
- "type": "keyword"
- }
- }
- }
- }
- }
- POST /sales/docs/_bulk?refresh
- {"index":{"_id":0}}
- {"product": "mad max", "price": "20", "timestamp": "2017-05-09T14:35"}
- {"index":{"_id":1}}
- {"product": "mad max", "price": "25", "timestamp": "2017-05-09T12:35"}
- {"index":{"_id":2}}
- {"product": "rocky", "price": "10", "timestamp": "2017-05-08T09:10"}
- {"index":{"_id":3}}
- {"product": "mad max", "price": "27", "timestamp": "2017-05-10T07:07"}
- {"index":{"_id":4}}
- {"product": "apocalypse now", "price": "10", "timestamp": "2017-05-11T08:35"}
- -------------------------------------------------
- // NOTCONSOLE
- // TESTSETUP
- //////////////////////////
- For instance the following document:
- [source,js]
- --------------------------------------------------
- {
- "keyword": ["foo", "bar"],
- "number": [23, 65, 76]
- }
- --------------------------------------------------
- // NOTCONSOLE
- \... creates the following composite buckets when `keyword` and `number` are used as values source
- for the aggregation:
- [source,js]
- --------------------------------------------------
- { "keyword": "foo", "number": 23 }
- { "keyword": "foo", "number": 65 }
- { "keyword": "foo", "number": 76 }
- { "keyword": "bar", "number": 23 }
- { "keyword": "bar", "number": 65 }
- { "keyword": "bar", "number": 76 }
- --------------------------------------------------
- // NOTCONSOLE
- ==== Values source
- The `sources` parameter controls the sources that should be used to build the composite buckets.
- There are three different types of values source:
- ===== Terms
- The `terms` value source is equivalent to a simple `terms` aggregation.
- The values are extracted from a field or a script exactly like the `terms` aggregation.
- Example:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- { "product": { "terms" : { "field": "product" } } }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- Like the `terms` aggregation it is also possible to use a script to create the values for the composite buckets:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- {
- "product": {
- "terms" : {
- "script" : {
- "source": "doc['product'].value",
- "lang": "painless"
- }
- }
- }
- }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- ===== Histogram
- The `histogram` value source can be applied on numeric values to build fixed size
- interval over the values. The `interval` parameter defines how the numeric values should be
- transformed. For instance an `interval` set to 5 will translate any numeric values to its closest interval,
- a value of `101` would be translated to `100` which is the key for the interval between 100 and 105.
- Example:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- { "histo": { "histogram" : { "field": "price", "interval": 5 } } }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- The values are built from a numeric field or a script that return numerical values:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- {
- "histo": {
- "histogram" : {
- "interval": 5,
- "script" : {
- "source": "doc['price'].value",
- "lang": "painless"
- }
- }
- }
- }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- ===== Date Histogram
- The `date_histogram` is similar to the `histogram` value source except that the interval
- is specified by date/time expression:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- { "date": { "date_histogram" : { "field": "timestamp", "interval": "1d" } } }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- The example above creates an interval per day and translates all `timestamp` values to the start of its closest intervals.
- Available expressions for interval: `year`, `quarter`, `month`, `week`, `day`, `hour`, `minute`, `second`
- Time values can also be specified via abbreviations supported by <<time-units,time units>> parsing.
- Note that fractional time values are not supported, but you can address this by shifting to another
- time unit (e.g., `1.5h` could instead be specified as `90m`).
- *Format*
- Internally, a date is represented as a 64 bit number representing a timestamp in milliseconds-since-the-epoch.
- These timestamps are returned as the bucket keys. It is possible to return a formatted date string instead using
- the format specified with the format parameter:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- {
- "date": {
- "date_histogram" : {
- "field": "timestamp",
- "interval": "1d",
- "format": "yyyy-MM-dd" <1>
- }
- }
- }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <1> Supports expressive date <<date-format-pattern,format pattern>>
- *Time Zone*
- Date-times are stored in Elasticsearch in UTC. By default, all bucketing and
- rounding is also done in UTC. The `time_zone` parameter can be used to indicate
- that bucketing should use a different time zone.
- Time zones may either be specified as an ISO 8601 UTC offset (e.g. `+01:00` or
- `-08:00`) or as a timezone id, an identifier used in the TZ database like
- `America/Los_Angeles`.
- ===== Mixing different values source
- The `sources` parameter accepts an array of values source.
- It is possible to mix different values source to create composite buckets.
- For example:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- { "date": { "date_histogram": { "field": "timestamp", "interval": "1d" } } },
- { "product": { "terms": {"field": "product" } } }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- This will create composite buckets from the values created by two values source, a `date_histogram` and a `terms`.
- Each bucket is composed of two values, one for each value source defined in the aggregation.
- Any type of combinations is allowed and the order in the array is preserved
- in the composite buckets.
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- { "shop": { "terms": {"field": "shop" } } },
- { "product": { "terms": { "field": "product" } } },
- { "date": { "date_histogram": { "field": "timestamp", "interval": "1d" } } }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- ==== Order
- By default the composite buckets are sorted by their natural ordering. Values are sorted
- in ascending order of their values. When multiple value sources are requested, the ordering is done per value
- source, the first value of the composite bucket is compared to the first value of the other composite bucket and if they are equals the
- next values in the composite bucket are used for tie-breaking. This means that the composite bucket
- `[foo, 100]` is considered smaller than `[foobar, 0]` because `foo` is considered smaller than `foobar`.
- It is possible to define the direction of the sort for each value source by setting `order` to `asc` (default value)
- or `desc` (descending order) directly in the value source definition.
- For example:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- { "date": { "date_histogram": { "field": "timestamp", "interval": "1d", "order": "desc" } } },
- { "product": { "terms": {"field": "product", "order": "asc" } } }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- \... will sort the composite bucket in descending order when comparing values from the `date_histogram` source
- and in ascending order when comparing values from the `terms` source.
- ====== Missing bucket
- By default documents without a value for a given source are ignored.
- It is possible to include them in the response by setting `missing_bucket` to
- `true` (defaults to `false`):
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- { "product_name": { "terms" : { "field": "product", "missing_bucket": true } } }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- In the example above the source `product_name` will emit an explicit `null` value
- for documents without a value for the field `product`.
- The `order` specified in the source dictates whether the `null` values should rank
- first (ascending order, `asc`) or last (descending order, `desc`).
- ==== Size
- The `size` parameter can be set to define how many composite buckets should be returned.
- Each composite bucket is considered as a single bucket so setting a size of 10 will return the
- first 10 composite buckets created from the values source.
- The response contains the values for each composite bucket in an array containing the values extracted
- from each value source.
- ==== After
- If the number of composite buckets is too high (or unknown) to be returned in a single response
- it is possible to split the retrieval in multiple requests.
- Since the composite buckets are flat by nature, the requested `size` is exactly the number of composite buckets
- that will be returned in the response (assuming that they are at least `size` composite buckets to return).
- If all composite buckets should be retrieved it is preferable to use a small size (`100` or `1000` for instance)
- and then use the `after` parameter to retrieve the next results.
- For example:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "size": 2,
- "sources" : [
- { "date": { "date_histogram": { "field": "timestamp", "interval": "1d" } } },
- { "product": { "terms": {"field": "product" } } }
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[s/_search/_search\?filter_path=aggregations/]
- \... returns:
- [source,js]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "my_buckets": {
- "after_key": { <1>
- "date": 1494288000000,
- "product": "mad max"
- },
- "buckets": [
- {
- "key": {
- "date": 1494201600000,
- "product": "rocky"
- },
- "doc_count": 1
- },
- {
- "key": {
- "date": 1494288000000,
- "product": "mad max"
- },
- "doc_count": 2
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\.//]
- <1> The last composite bucket returned by the query.
- NOTE: The `after_key` is equals to the last bucket returned in the response before
- any filtering that could be done by <<search-aggregations-pipeline, Pipeline aggregations>>.
- If all buckets are filtered/removed by a pipeline aggregation, the `after_key` will contain
- the last bucket before filtering.
- The `after` parameter can be used to retrieve the composite buckets that are **after**
- the last composite buckets returned in a previous round.
- For the example below the last bucket can be found in `after_key` and the next
- round of result can be retrieved with:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "size": 2,
- "sources" : [
- { "date": { "date_histogram": { "field": "timestamp", "interval": "1d", "order": "desc" } } },
- { "product": { "terms": {"field": "product", "order": "asc" } } }
- ],
- "after": { "date": 1494288000000, "product": "mad max" } <1>
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <1> Should restrict the aggregation to buckets that sort **after** the provided values.
- ==== Sub-aggregations
- Like any `multi-bucket` aggregations the `composite` aggregation can hold sub-aggregations.
- These sub-aggregations can be used to compute other buckets or statistics on each composite bucket created by this
- parent aggregation.
- For instance the following example computes the average value of a field
- per composite bucket:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "aggs" : {
- "my_buckets": {
- "composite" : {
- "sources" : [
- { "date": { "date_histogram": { "field": "timestamp", "interval": "1d", "order": "desc" } } },
- { "product": { "terms": {"field": "product" } } }
- ]
- },
- "aggregations": {
- "the_avg": {
- "avg": { "field": "price" }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[s/_search/_search\?filter_path=aggregations/]
- \... returns:
- [source,js]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "my_buckets": {
- "after_key": {
- "date": 1494201600000,
- "product": "rocky"
- },
- "buckets": [
- {
- "key": {
- "date": 1494460800000,
- "product": "apocalypse now"
- },
- "doc_count": 1,
- "the_avg": {
- "value": 10.0
- }
- },
- {
- "key": {
- "date": 1494374400000,
- "product": "mad max"
- },
- "doc_count": 1,
- "the_avg": {
- "value": 27.0
- }
- },
- {
- "key": {
- "date": 1494288000000,
- "product" : "mad max"
- },
- "doc_count": 2,
- "the_avg": {
- "value": 22.5
- }
- },
- {
- "key": {
- "date": 1494201600000,
- "product": "rocky"
- },
- "doc_count": 1,
- "the_avg": {
- "value": 10.0
- }
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\.//]
|