|
@@ -1,6 +1,8 @@
|
|
|
[[search-aggregations]]
|
|
|
-== Aggregations
|
|
|
+= Aggregations
|
|
|
|
|
|
+[partintro]
|
|
|
+--
|
|
|
The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks
|
|
|
called aggregations, that can be composed in order to build complex summaries of the data.
|
|
|
|
|
@@ -11,16 +13,19 @@ query/filters of the search request).
|
|
|
There are many different types of aggregations, each with its own purpose and output. To better understand these types,
|
|
|
it is often easier to break them into two main families:
|
|
|
|
|
|
-_Bucketing_::
|
|
|
+<<search-aggregations-bucket, _Bucketing_>>::
|
|
|
A family of aggregations that build buckets, where each bucket is associated with a _key_ and a document
|
|
|
criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in
|
|
|
the context and when a criterion matches, the document is considered to "fall in" the relevant bucket.
|
|
|
By the end of the aggregation process, we'll end up with a list of buckets - each one with a set of
|
|
|
documents that "belong" to it.
|
|
|
|
|
|
-_Metric_::
|
|
|
+<<search-aggregations-metrics, _Metric_>>::
|
|
|
Aggregations that keep track and compute metrics over a set of documents.
|
|
|
|
|
|
+<<search-aggregations-reducer, _Reducer_>>::
|
|
|
+ Aggregations that aggregate the output of other aggregations and their associated metrics
|
|
|
+
|
|
|
The interesting part comes next. Since each bucket effectively defines a document set (all documents belonging to
|
|
|
the bucket), one can potentially associate aggregations on the bucket level, and those will execute within the context
|
|
|
of that bucket. This is where the real power of aggregations kicks in: *aggregations can be nested!*
|
|
@@ -31,7 +36,7 @@ NOTE: Bucketing aggregations can have sub-aggregations (bucketing or metric). Th
|
|
|
another higher-level aggregation).
|
|
|
|
|
|
[float]
|
|
|
-=== Structuring Aggregations
|
|
|
+== Structuring Aggregations
|
|
|
|
|
|
The following snippet captures the basic structure of aggregations:
|
|
|
|
|
@@ -62,7 +67,7 @@ bucketing aggregation. For example, if you define a set of aggregations under th
|
|
|
sub-aggregations will be computed for the range buckets that are defined.
|
|
|
|
|
|
[float]
|
|
|
-==== Values Source
|
|
|
+=== Values Source
|
|
|
|
|
|
Some aggregations work on values extracted from the aggregated documents. Typically, the values will be extracted from
|
|
|
a specific document field which is set using the `field` key for the aggregations. It is also possible to define a
|
|
@@ -89,142 +94,7 @@ perform optimizations when dealing with sorted values (for example, with the `mi
|
|
|
sorted, Elasticsearch will skip the iterations over all the values and rely on the first value in the list to be the
|
|
|
minimum value among all other values associated with the same document).
|
|
|
|
|
|
-[float]
|
|
|
-=== Metrics Aggregations
|
|
|
-
|
|
|
-The aggregations in this family compute metrics based on values extracted in one way or another from the documents that
|
|
|
-are being aggregated. The values are typically extracted from the fields of the document (using the field data), but
|
|
|
-can also be generated using scripts.
|
|
|
-
|
|
|
-Numeric metrics aggregations are a special type of metrics aggregation which output numeric values. Some aggregations output
|
|
|
-a single numeric metric (e.g. `avg`) and are called `single-value numeric metrics aggregation`, others generate multiple
|
|
|
-metrics (e.g. `stats`) and are called `multi-value numeric metrics aggregation`. The distinction between single-value and
|
|
|
-multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some
|
|
|
-bucket aggregations (some bucket aggregations enable you to sort the returned buckets based on the numeric metrics in each bucket).
|
|
|
-
|
|
|
-
|
|
|
-[float]
|
|
|
-=== Bucket Aggregations
|
|
|
-
|
|
|
-Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create
|
|
|
-buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines
|
|
|
-whether or not a document in the current context "falls" into it. In other words, the buckets effectively define document
|
|
|
-sets. In addition to the buckets themselves, the `bucket` aggregations also compute and return the number of documents
|
|
|
-that "fell in" to each bucket.
|
|
|
-
|
|
|
-Bucket aggregations, as opposed to `metrics` aggregations, can hold sub-aggregations. These sub-aggregations will be
|
|
|
-aggregated for the buckets created by their "parent" bucket aggregation.
|
|
|
-
|
|
|
-There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
|
|
|
-define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.
|
|
|
-
|
|
|
-[float]
|
|
|
-=== Reducer Aggregations
|
|
|
-
|
|
|
-coming[2.0.0]
|
|
|
-
|
|
|
-experimental[]
|
|
|
-
|
|
|
-Reducer aggregations work on the outputs produced from other aggregations rather than from document sets, adding
|
|
|
-information to the output tree. There are many different types of reducer, each computing different information from
|
|
|
-other aggregations, but these types can broken down into two families:
|
|
|
-
|
|
|
-_Parent_::
|
|
|
- A family of reducer aggregations that is provided with the output of its parent aggregation and is able
|
|
|
- to compute new buckets or new aggregations to add to existing buckets.
|
|
|
-
|
|
|
-_Sibling_::
|
|
|
- Reducer aggregations that are provided with the output of a sibling aggregation and are able to compute a
|
|
|
- new aggregation which will be at the same level as the sibling aggregation.
|
|
|
-
|
|
|
-Reducer aggregations can reference the aggregations they need to perform their computation by using the `buckets_paths`
|
|
|
-parameter to indicate the paths to the required metrics. The syntax for defining these paths can be found in the
|
|
|
-<<search-aggregations-bucket-terms-aggregation-order, terms aggregation order>> section.
|
|
|
-
|
|
|
-?????? SHOULD THE SECTION ABOUT DEFINING AGGREGATION PATHS
|
|
|
-BE IN THIS PAGE AND REFERENCED FROM THE TERMS AGGREGATION DOCUMENTATION ???????
|
|
|
-
|
|
|
-Reducer aggregations cannot have sub-aggregations but depending on the type it can reference another reducer in the `buckets_path`
|
|
|
-allowing reducers to be chained.
|
|
|
-
|
|
|
-NOTE: Because reducer aggregations only add to the output, when chaining reducer aggregations the output of each reducer will be
|
|
|
-included in the final output.
|
|
|
-
|
|
|
-[float]
|
|
|
-=== Caching heavy aggregations
|
|
|
-
|
|
|
-Frequently used aggregations (e.g. for display on the home page of a website)
|
|
|
-can be cached for faster responses. These cached results are the same results
|
|
|
-that would be returned by an uncached aggregation -- you will never get stale
|
|
|
-results.
|
|
|
-
|
|
|
-See <<index-modules-shard-query-cache>> for more details.
|
|
|
-
|
|
|
-[float]
|
|
|
-=== Returning only aggregation results
|
|
|
-
|
|
|
-There are many occasions when aggregations are required but search hits are not. For these cases the hits can be ignored by
|
|
|
-setting `size=0`. For example:
|
|
|
-
|
|
|
-[source,js]
|
|
|
---------------------------------------------------
|
|
|
-$ curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '{
|
|
|
- "size": 0,
|
|
|
- "aggregations": {
|
|
|
- "my_agg": {
|
|
|
- "terms": {
|
|
|
- "field": "text"
|
|
|
- }
|
|
|
- }
|
|
|
- }
|
|
|
-}
|
|
|
-'
|
|
|
---------------------------------------------------
|
|
|
-
|
|
|
-Setting `size` to `0` avoids executing the fetch phase of the search making the request more efficient.
|
|
|
-
|
|
|
-[float]
|
|
|
-=== Metadata
|
|
|
-
|
|
|
-You can associate a piece of metadata with individual aggregations at request time that will be returned in place
|
|
|
-at response time.
|
|
|
-
|
|
|
-Consider this example where we want to associate the color blue with our `terms` aggregation.
|
|
|
-
|
|
|
-[source,js]
|
|
|
---------------------------------------------------
|
|
|
-{
|
|
|
- ...
|
|
|
- aggs": {
|
|
|
- "titles": {
|
|
|
- "terms": {
|
|
|
- "field": "title"
|
|
|
- },
|
|
|
- "meta": {
|
|
|
- "color": "blue"
|
|
|
- },
|
|
|
- }
|
|
|
- }
|
|
|
-}
|
|
|
---------------------------------------------------
|
|
|
-
|
|
|
-Then that piece of metadata will be returned in place for our `titles` terms aggregation
|
|
|
-
|
|
|
-[source,js]
|
|
|
---------------------------------------------------
|
|
|
-{
|
|
|
- ...
|
|
|
- "aggregations": {
|
|
|
- "titles": {
|
|
|
- "meta": {
|
|
|
- "color" : "blue"
|
|
|
- },
|
|
|
- "buckets": [
|
|
|
- ]
|
|
|
- }
|
|
|
- }
|
|
|
-}
|
|
|
---------------------------------------------------
|
|
|
+--
|
|
|
|
|
|
include::aggregations/metrics.asciidoc[]
|
|
|
|
|
@@ -232,3 +102,4 @@ include::aggregations/bucket.asciidoc[]
|
|
|
|
|
|
include::aggregations/reducer.asciidoc[]
|
|
|
|
|
|
+include::aggregations/misc.asciidoc[]
|