123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140 |
- [[search-aggregations]]
- == Aggregations
- Aggregations grew out of the <<search-facets, facets>> module and the long experience of how users use it
- (and would like to use it) for real-time data analytics purposes. As such, it serves as the next generation
- replacement for the functionality we currently refer to as "faceting".
- <<search-facets, Facets>> provide a great way to aggregate data within a document set context.
- This context is defined by the executed query in combination with the different levels of filters that can be defined
- (filtered queries, top level filters, and facet level filters). While powerful, their implementation is not designed
- from ground up to support complex aggregations and thus limited.
- .Are facets deprecated?
- **********************************
- As the functionality facets offer is a subset of the the one offered by aggregations, over time, we would like to
- see users move to aggregations for all realtime data analytics. That said, we are well aware that such
- transitions/migrations take time, and for this reason we are keeping the facets around for the time being.
- Nonetheless, facets are and should be considered deprecated and will likely be removed in one of the future major
- releases.
- **********************************
- The aggregations module breaks the barriers the current facet implementation put in place. The new name ("Aggregations")
- also indicate the intention here - a generic yet extremely powerful framework for building aggregations - any types of
- aggregations.
- An aggregation can be seen as a _unit-of-work_ that builds analytic information over a set of documents. The context of
- the execution defines what this document set is (e.g. a top level aggregation executes within the context of the executed
- query/filters of the search request).
- There are many different types of aggregations, each with its own purpose and output. To better understand these types,
- it is often easier to break them into two main families:
- _Bucketing_::
- A family of aggregations that build buckets, where each bucket is associated with a _key_ and a document
- criteria. When the aggregations is executed, the buckets criterias are evaluated on every document in
- the context and when matches, the document is considered to "fall in" the relevant bucket. By the end of
- the aggreagation process, we'll end up with a list of buckets - each one with a set of documents that
- "belong" to it.
- _Metric_::
- Aggregations that keep track and compute metrics over a set of documents
- The interesting part comes next, since each bucket effectively defines a document set (all documents belonging to
- the bucket), one can potentially associated aggregations on the bucket level, and those will execute within the context
- of that bucket. This is where the real power of aggregations kicks in: *aggregations can be nested!*
- NOTE: Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub aggregations will be computed for
- each of the buckets their parent aggregation generates. There is not hard limit on the level/depth of nested
- aggregations (one can nest an aggregation under a "parent" aggregation which is itself a sub-aggregation of
- another higher aggregations)
- [float]
- === Structuring Aggregations
- The following snippet captures the basic structure of aggregations:
- [source,js]
- --------------------------------------------------
- "aggregations" : {
- "<aggregation_name>" : {
- "<aggregation_type>" : {
- <aggregation_body>
- }
- [,"aggregations" : { [<sub_aggregation>]+ } ]?
- }
- [,"<aggregation_name_2>" : { ... } ]*
- }
- --------------------------------------------------
- The `aggregations` object (a.k.a `aggs` for short) in the json holds the aggregations to be computed. Each aggregation
- is associated with a logical name that the user defines (e.g. if the aggregation computes the average price, then it'll
- make sense to name it `avg_price`). These logical names will also be used to uniquely identify the aggregations in the
- response. Each aggregation has a specific type (`<aggregation_type>` in the above snippet) and is typically the first
- key within the named aggregation body. Each type of aggregation define its own body, depending on the nature of the
- aggregation (eg. an `avg` aggregation on a specific field will define the field on which the avg will be calculated).
- At the same level of the aggregation type definition, one can optionally define a set of additional aggregations,
- though this only makes sense if the aggregation you defined is of a bucketing nature. In this scenario, the
- sub-aggregations you define on the bucketing aggregation level will be computed for all the buckets built by the
- bucketing aggregation. For example, if the you define a set of aggregations under the `range` aggregation, the
- sub-aggregations will be computed for each of the range buckets that are defined.
- [float]
- ==== Values Source
- Some aggregations work on values extracted from the aggregated documents. Typically, the values will be extracted from
- a sepcific document field which is set under the `field` settings for the aggrations. It is also possible to define a
- <<modules-scripting,`script`>> that will generate the values (per document).
- When both `field` and `script` settings are configured for the aggregation, the script will be treated as a
- `value script`. While normal scripts are evaluated on a document level (i.e. the script has access to all the data
- associated with the document), value scripts are evaluated on the *value* level. In this mode, the values are extracted
- from the configured `field` and the `script` is used to apply a "transformation" over these value/s
- ["NOTE",id="aggs-script-note"]
- ===============================
- When working with scripts, the `lang` and `params` settings can also be defined. The former defines the scripting
- language that is used (assuming the proper language is available in es either by default or as a plugin). The latter
- enables defining all the "dynamic" expressions in the script as parameters, and by that keep the script itself static
- between calls (this will ensure the use of the cached compiled scripts in elasticsearch).
- ===============================
- Scripts can generate a single value or multiple values per documents. When generating multiple values, once can use the
- `script_values_sorted` settings to indicate whether these values are sorted or not. Internally, elasticsearch can
- perform optimizations when dealing with sorted values (for example, with the `min` aggregations, knowing the values are
- sorted, elasticsearch will skip the iterations over all the values and rely on the first value in the list to be the
- minimum value among all other values associated with the same document).
- [float]
- === Metrics Aggregations
- The aggregations in this family compute metrics based on values extracted in one way or another from the documents that
- are being aggregated. The values are typically extracted from the fields of the document (using the field data), but
- can also be generated using scripts. Some aggregations output a single metric (e.g. `avg`) and are called `single-value
- metrics aggregation`, others generate multiple metrics (e.g. `stats`) and are called `multi-value metrics aggregation`.
- The distinction between single-value and multi-value metrics aggregations plays a role when these aggregations serve as
- direct sub-aggregations of some bucket aggregations (some bucket aggregation enable you to sort the returned buckets based
- on the metrics in each bucket).
- [float]
- === Bucket Aggregations
- Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create
- buckets of documents. Each bucket is associated with a criteria (depends on the aggregation type) that determines
- whether or not a document in the current context "falls" in it. In other words, the buckets effectively define document
- sets. In addition to the buckets themselves, the `bucket` aggregations also compute and return the number of documents
- that "fell in" each bucket.
- Bucket aggregations, as opposed to `metrics` aggregations, can hold sub-aggregations. These sub aggregations will be
- aggregated for each of the buckets created by their "parent" bucket aggregation.
- There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
- define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.
- include::aggregations/metrics.asciidoc[]
- include::aggregations/bucket.asciidoc[]
|