aggregations.asciidoc 9.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199
  1. [[search-aggregations]]
  2. == Aggregations
  3. The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks
  4. called aggregations, that can be composed in order to build complex summaries of the data.
  5. An aggregation can be seen as a _unit-of-work_ that builds analytic information over a set of documents. The context of
  6. the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed
  7. query/filters of the search request).
  8. There are many different types of aggregations, each with its own purpose and output. To better understand these types,
  9. it is often easier to break them into two main families:
  10. _Bucketing_::
  11. A family of aggregations that build buckets, where each bucket is associated with a _key_ and a document
  12. criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in
  13. the context and when a criterion matches, the document is considered to "fall in" the relevant bucket.
  14. By the end of the aggregation process, we'll end up with a list of buckets - each one with a set of
  15. documents that "belong" to it.
  16. _Metric_::
  17. Aggregations that keep track and compute metrics over a set of documents.
  18. The interesting part comes next. Since each bucket effectively defines a document set (all documents belonging to
  19. the bucket), one can potentially associate aggregations on the bucket level, and those will execute within the context
  20. of that bucket. This is where the real power of aggregations kicks in: *aggregations can be nested!*
  21. NOTE: Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub-aggregations will be computed for
  22. the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested
  23. aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of
  24. another higher-level aggregation).
  25. [float]
  26. === Structuring Aggregations
  27. The following snippet captures the basic structure of aggregations:
  28. [source,js]
  29. --------------------------------------------------
  30. "aggregations" : {
  31. "<aggregation_name>" : {
  32. "<aggregation_type>" : {
  33. <aggregation_body>
  34. }
  35. [,"meta" : { [<meta_data_body>] } ]?
  36. [,"aggregations" : { [<sub_aggregation>]+ } ]?
  37. }
  38. [,"<aggregation_name_2>" : { ... } ]*
  39. }
  40. --------------------------------------------------
  41. The `aggregations` object (the key `aggs` can also be used) in the JSON holds the aggregations to be computed. Each aggregation
  42. is associated with a logical name that the user defines (e.g. if the aggregation computes the average price, then it would
  43. make sense to name it `avg_price`). These logical names will also be used to uniquely identify the aggregations in the
  44. response. Each aggregation has a specific type (`<aggregation_type>` in the above snippet) and is typically the first
  45. key within the named aggregation body. Each type of aggregation defines its own body, depending on the nature of the
  46. aggregation (e.g. an `avg` aggregation on a specific field will define the field on which the average will be calculated).
  47. At the same level of the aggregation type definition, one can optionally define a set of additional aggregations,
  48. though this only makes sense if the aggregation you defined is of a bucketing nature. In this scenario, the
  49. sub-aggregations you define on the bucketing aggregation level will be computed for all the buckets built by the
  50. bucketing aggregation. For example, if you define a set of aggregations under the `range` aggregation, the
  51. sub-aggregations will be computed for the range buckets that are defined.
  52. [float]
  53. ==== Values Source
  54. Some aggregations work on values extracted from the aggregated documents. Typically, the values will be extracted from
  55. a specific document field which is set using the `field` key for the aggregations. It is also possible to define a
  56. <<modules-scripting,`script`>> which will generate the values (per document).
  57. TIP: The `script` parameter expects an inline script. Use `script_id` for indexed scripts and `script_file` for scripts in the `config/scripts/` directory.
  58. When both `field` and `script` settings are configured for the aggregation, the script will be treated as a
  59. `value script`. While normal scripts are evaluated on a document level (i.e. the script has access to all the data
  60. associated with the document), value scripts are evaluated on the *value* level. In this mode, the values are extracted
  61. from the configured `field` and the `script` is used to apply a "transformation" over these value/s.
  62. ["NOTE",id="aggs-script-note"]
  63. ===============================
  64. When working with scripts, the `lang` and `params` settings can also be defined. The former defines the scripting
  65. language which is used (assuming the proper language is available in Elasticsearch, either by default or as a plugin). The latter
  66. enables defining all the "dynamic" expressions in the script as parameters, which enables the script to keep itself static
  67. between calls (this will ensure the use of the cached compiled scripts in Elasticsearch).
  68. ===============================
  69. Scripts can generate a single value or multiple values per document. When generating multiple values, one can use the
  70. `script_values_sorted` settings to indicate whether these values are sorted or not. Internally, Elasticsearch can
  71. perform optimizations when dealing with sorted values (for example, with the `min` aggregations, knowing the values are
  72. sorted, Elasticsearch will skip the iterations over all the values and rely on the first value in the list to be the
  73. minimum value among all other values associated with the same document).
  74. [float]
  75. === Metrics Aggregations
  76. The aggregations in this family compute metrics based on values extracted in one way or another from the documents that
  77. are being aggregated. The values are typically extracted from the fields of the document (using the field data), but
  78. can also be generated using scripts.
  79. Numeric metrics aggregations are a special type of metrics aggregation which output numeric values. Some aggregations output
  80. a single numeric metric (e.g. `avg`) and are called `single-value numeric metrics aggregation`, others generate multiple
  81. metrics (e.g. `stats`) and are called `multi-value numeric metrics aggregation`. The distinction between single-value and
  82. multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some
  83. bucket aggregations (some bucket aggregations enable you to sort the returned buckets based on the numeric metrics in each bucket).
  84. [float]
  85. === Bucket Aggregations
  86. Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create
  87. buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines
  88. whether or not a document in the current context "falls" into it. In other words, the buckets effectively define document
  89. sets. In addition to the buckets themselves, the `bucket` aggregations also compute and return the number of documents
  90. that "fell in" to each bucket.
  91. Bucket aggregations, as opposed to `metrics` aggregations, can hold sub-aggregations. These sub-aggregations will be
  92. aggregated for the buckets created by their "parent" bucket aggregation.
  93. There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
  94. define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.
  95. [float]
  96. === Caching heavy aggregations
  97. Frequently used aggregations (e.g. for display on the home page of a website)
  98. can be cached for faster responses. These cached results are the same results
  99. that would be returned by an uncached aggregation -- you will never get stale
  100. results.
  101. See <<index-modules-shard-query-cache>> for more details.
  102. [float]
  103. === Returning only aggregation results
  104. There are many occasions when aggregations are required but search hits are not. For these cases the hits can be ignored by
  105. setting `size=0`. For example:
  106. [source,js]
  107. --------------------------------------------------
  108. $ curl -XGET 'http://localhost:9200/twitter/tweet/_search' -d '{
  109. "size": 0,
  110. "aggregations": {
  111. "my_agg": {
  112. "terms": {
  113. "field": "text"
  114. }
  115. }
  116. }
  117. }
  118. '
  119. --------------------------------------------------
  120. Setting `size` to `0` avoids executing the fetch phase of the search making the request more efficient.
  121. [float]
  122. === Metadata
  123. You can associate a piece of metadata with individual aggregations at request time that will be returned in place
  124. at response time.
  125. Consider this example where we want to associate the color blue with our `terms` aggregation.
  126. [source,js]
  127. --------------------------------------------------
  128. {
  129. ...
  130. aggs": {
  131. "titles": {
  132. "terms": {
  133. "field": "title"
  134. },
  135. "meta": {
  136. "color": "blue"
  137. },
  138. }
  139. }
  140. }
  141. --------------------------------------------------
  142. Then that piece of metadata will be returned in place for our `titles` terms aggregation
  143. [source,js]
  144. --------------------------------------------------
  145. {
  146. ...
  147. "aggregations": {
  148. "titles": {
  149. "meta": {
  150. "color" : "blue"
  151. },
  152. "buckets": [
  153. ]
  154. }
  155. }
  156. }
  157. --------------------------------------------------
  158. include::aggregations/metrics.asciidoc[]
  159. include::aggregations/bucket.asciidoc[]