| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119 | [[search-aggregations]]= Aggregations[partintro]--The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blockscalled aggregations, that can be composed in order to build complex summaries of the data.An aggregation can be seen as a _unit-of-work_ that builds analytic information over a set of documents. The context ofthe execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executedquery/filters of the search request).There are many different types of aggregations, each with its own purpose and output. To better understand these types,it is often easier to break them into four main families:<<search-aggregations-bucket, _Bucketing_>>::				A family of aggregations that build buckets, where each bucket is associated with a _key_ and a document				criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in				the context and when a criterion matches, the document is considered to "fall in" the relevant bucket.				By the end of the aggregation process, we'll end up with a list of buckets - each one with a set of				documents that "belong" to it.<<search-aggregations-metrics, _Metric_>>::				Aggregations that keep track and compute metrics over a set of documents.<<search-aggregations-matrix, _Matrix_>>::                A family of aggregations that operate on multiple fields and produce a matrix result based on the                values extracted from the requested document fields. Unlike metric and bucket aggregations, this                aggregation family does not yet support scripting.<<search-aggregations-pipeline, _Pipeline_>>::				Aggregations that aggregate the output of other aggregations and their associated metricsThe interesting part comes next. Since each bucket effectively defines a document set (all documents belonging tothe bucket), one can potentially associate aggregations on the bucket level, and those will execute within the contextof that bucket. This is where the real power of aggregations kicks in: *aggregations can be nested!*NOTE:	Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub-aggregations will be computed for		the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested		aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of		another higher-level aggregation).NOTE:   Aggregations operate on the `double` representation of        the data. As a consequence, the result may be approximate when running on longs        whose absolute value is greater than `2^53`.[float]== Structuring AggregationsThe following snippet captures the basic structure of aggregations:[source,js]--------------------------------------------------"aggregations" : {    "<aggregation_name>" : {        "<aggregation_type>" : {            <aggregation_body>        }        [,"meta" : {  [<meta_data_body>] } ]?        [,"aggregations" : { [<sub_aggregation>]+ } ]?    }    [,"<aggregation_name_2>" : { ... } ]*}--------------------------------------------------// NOTCONSOLEThe `aggregations` object (the key `aggs` can also be used) in the JSON holds the aggregations to be computed. Each aggregationis associated with a logical name that the user defines (e.g. if the aggregation computes the average price, then it wouldmake sense to name it `avg_price`). These logical names will also be used to uniquely identify the aggregations in theresponse. Each aggregation has a specific type (`<aggregation_type>` in the above snippet) and is typically the firstkey within the named aggregation body. Each type of aggregation defines its own body, depending on the nature of theaggregation (e.g. an `avg` aggregation on a specific field will define the field on which the average will be calculated).At the same level of the aggregation type definition, one can optionally define a set of additional aggregations,though this only makes sense if the aggregation you defined is of a bucketing nature. In this scenario, thesub-aggregations you define on the bucketing aggregation level will be computed for all the buckets built by thebucketing aggregation. For example, if you define a set of aggregations under the `range` aggregation, thesub-aggregations will be computed for the range buckets that are defined.[float]=== Values SourceSome aggregations work on values extracted from the aggregated documents. Typically, the values will be extracted froma specific document field which is set using the `field` key for the aggregations. It is also possible to define a<<modules-scripting,`script`>> which will generate the values (per document).When both `field` and `script` settings are configured for the aggregation, the script will be treated as a`value script`.  While normal scripts are evaluated on a document level (i.e. the script has access to all the dataassociated with the document), value scripts are evaluated on the *value* level. In this mode, the values are extractedfrom the configured `field` and the `script` is used to apply a "transformation" over these value/s.["NOTE",id="aggs-script-note"]===============================When working with scripts, the `lang` and `params` settings can also be defined. The former defines the scriptinglanguage which is used (assuming the proper language is available in Elasticsearch, either by default or as a plugin). The latterenables defining all the "dynamic" expressions in the script as parameters, which enables the script to keep itself staticbetween calls (this will ensure the use of the cached compiled scripts in Elasticsearch).===============================Elasticsearch uses the type of the field in the mapping in order to figure outhow to run the aggregation and format the response. However there are two casesin which Elasticsearch cannot figure out this information: unmapped fields (forinstance in the case of a search request across multiple indices, and only someof them have a mapping for the field) and pure scripts. For those cases, it ispossible to give Elasticsearch a hint using the `value_type` option, whichaccepts the following values: `string`, `long` (works for all integer types),`double` (works for all decimal types like `float` or `scaled_float`), `date`,`ip` and `boolean`.--include::aggregations/metrics.asciidoc[]include::aggregations/bucket.asciidoc[]include::aggregations/pipeline.asciidoc[]include::aggregations/matrix.asciidoc[]include::aggregations/misc.asciidoc[]
 |