|
@@ -3,10 +3,61 @@
|
|
|
|
|
|
A multi-bucket value source based aggregation where buckets are dynamically built - one per unique value.
|
|
|
|
|
|
+//////////////////////////
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+PUT /products
|
|
|
+{
|
|
|
+ "mappings": {
|
|
|
+ "product": {
|
|
|
+ "properties": {
|
|
|
+ "genre": {
|
|
|
+ "type": "keyword"
|
|
|
+ },
|
|
|
+ "product": {
|
|
|
+ "type": "keyword"
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+
|
|
|
+POST /products/product/_bulk?refresh
|
|
|
+{"index":{"_id":0}}
|
|
|
+{"genre": "rock", "product": "Product A"}
|
|
|
+{"index":{"_id":1}}
|
|
|
+{"genre": "rock"}
|
|
|
+{"index":{"_id":2}}
|
|
|
+{"genre": "rock"}
|
|
|
+{"index":{"_id":3}}
|
|
|
+{"genre": "jazz", "product": "Product Z"}
|
|
|
+{"index":{"_id":4}}
|
|
|
+{"genre": "jazz"}
|
|
|
+{"index":{"_id":5}}
|
|
|
+{"genre": "electronic"}
|
|
|
+{"index":{"_id":6}}
|
|
|
+{"genre": "electronic"}
|
|
|
+{"index":{"_id":7}}
|
|
|
+{"genre": "electronic"}
|
|
|
+{"index":{"_id":8}}
|
|
|
+{"genre": "electronic"}
|
|
|
+{"index":{"_id":9}}
|
|
|
+{"genre": "electronic"}
|
|
|
+{"index":{"_id":10}}
|
|
|
+{"genre": "electronic"}
|
|
|
+
|
|
|
+-------------------------------------------------
|
|
|
+// NOTCONSOLE
|
|
|
+// TESTSETUP
|
|
|
+
|
|
|
+//////////////////////////
|
|
|
+
|
|
|
Example:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"genres" : {
|
|
@@ -15,6 +66,8 @@ Example:
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
+// TEST[s/_search/_search\?filter_path=aggregations/]
|
|
|
|
|
|
Response:
|
|
|
|
|
@@ -22,30 +75,29 @@ Response:
|
|
|
--------------------------------------------------
|
|
|
{
|
|
|
...
|
|
|
-
|
|
|
"aggregations" : {
|
|
|
"genres" : {
|
|
|
"doc_count_error_upper_bound": 0, <1>
|
|
|
"sum_other_doc_count": 0, <2>
|
|
|
"buckets" : [ <3>
|
|
|
{
|
|
|
- "key" : "jazz",
|
|
|
- "doc_count" : 10
|
|
|
+ "key" : "electronic",
|
|
|
+ "doc_count" : 6
|
|
|
},
|
|
|
{
|
|
|
"key" : "rock",
|
|
|
- "doc_count" : 10
|
|
|
+ "doc_count" : 3
|
|
|
},
|
|
|
{
|
|
|
- "key" : "electronic",
|
|
|
- "doc_count" : 10
|
|
|
- },
|
|
|
+ "key" : "jazz",
|
|
|
+ "doc_count" : 2
|
|
|
+ }
|
|
|
]
|
|
|
}
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
-
|
|
|
+// TESTRESPONSE[s/\.\.\.//]
|
|
|
<1> an upper bound of the error on the document counts for each term, see <<search-aggregations-bucket-terms-aggregation-approximate-counts,below>>
|
|
|
<2> when there are lots of unique terms, elasticsearch only returns the top terms; this number is the sum of the document counts for all buckets that are not part of the response
|
|
|
<3> the list of the top buckets, the meaning of `top` being defined by the <<search-aggregations-bucket-terms-aggregation-order,order>>
|
|
@@ -74,6 +126,7 @@ A request is made to obtain the top 5 terms in the field product, ordered by des
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"products" : {
|
|
@@ -85,6 +138,8 @@ A request is made to obtain the top 5 terms in the field product, ordered by des
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
+// TEST[s/_search/_search\?filter_path=aggregations/]
|
|
|
|
|
|
The terms for each of the three shards are shown below with their
|
|
|
respective document counts in brackets:
|
|
@@ -108,7 +163,6 @@ respective document counts in brackets:
|
|
|
|
|
|
The shards will return their top 5 terms so the results from the shards will be:
|
|
|
|
|
|
-
|
|
|
[width="100%",cols="^2,^2,^2,^2",options="header"]
|
|
|
|=========================================================
|
|
|
| | Shard A | Shard B | Shard C
|
|
@@ -165,9 +219,9 @@ otherwise.
|
|
|
|
|
|
==== Calculating Document Count Error
|
|
|
|
|
|
-There are two error values which can be shown on the terms aggregation. The first gives a value for the aggregation as
|
|
|
+There are two error values which can be shown on the terms aggregation. The first gives a value for the aggregation as
|
|
|
a whole which represents the maximum potential document count for a term which did not make it into the final list of
|
|
|
-terms. This is calculated as the sum of the document count from the last term returned from each shard .For the example
|
|
|
+terms. This is calculated as the sum of the document count from the last term returned from each shard. For the example
|
|
|
given above the value would be 46 (2 + 15 + 29). This means that in the worst case scenario a term which was not returned
|
|
|
could have the 4th highest document count.
|
|
|
|
|
@@ -175,10 +229,10 @@ could have the 4th highest document count.
|
|
|
--------------------------------------------------
|
|
|
{
|
|
|
...
|
|
|
-
|
|
|
"aggregations" : {
|
|
|
"products" : {
|
|
|
"doc_count_error_upper_bound" : 46,
|
|
|
+ "sum_other_doc_count" : 79,
|
|
|
"buckets" : [
|
|
|
{
|
|
|
"key" : "Product A",
|
|
@@ -187,33 +241,55 @@ could have the 4th highest document count.
|
|
|
{
|
|
|
"key" : "Product Z",
|
|
|
"doc_count" : 52
|
|
|
- },
|
|
|
+ }
|
|
|
...
|
|
|
]
|
|
|
}
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// TESTRESPONSE[s/\.\.\.//]
|
|
|
+// TESTRESPONSE[s/: (\-)?[0-9]+/: $body.$_path/]
|
|
|
|
|
|
==== Per bucket document count error
|
|
|
|
|
|
-The second error value can be enabled by setting the `show_term_doc_count_error` parameter to true. This shows an error value
|
|
|
-for each term returned by the aggregation which represents the 'worst case' error in the document count and can be useful when
|
|
|
-deciding on a value for the `shard_size` parameter. This is calculated by summing the document counts for the last term returned
|
|
|
-by all shards which did not return the term. In the example above the error in the document count for Product C would be 15 as
|
|
|
-Shard B was the only shard not to return the term and the document count of the last term it did return was 15. The actual document
|
|
|
-count of Product C was 54 so the document count was only actually off by 4 even though the worst case was that it would be off by
|
|
|
-15. Product A, however has an error of 0 for its document count, since every shard returned it we can be confident that the count
|
|
|
-returned is accurate.
|
|
|
+The second error value can be enabled by setting the `show_term_doc_count_error` parameter to true:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
- ...
|
|
|
+ "aggs" : {
|
|
|
+ "products" : {
|
|
|
+ "terms" : {
|
|
|
+ "field" : "product",
|
|
|
+ "size" : 5,
|
|
|
+ "show_term_doc_count_error": true
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
+// TEST[s/_search/_search\?filter_path=aggregations/]
|
|
|
+
|
|
|
+
|
|
|
+This shows an error value for each term returned by the aggregation which represents the 'worst case' error in the document count
|
|
|
+and can be useful when deciding on a value for the `shard_size` parameter. This is calculated by summing the document counts for
|
|
|
+the last term returned by all shards which did not return the term. In the example above the error in the document count for Product C
|
|
|
+would be 15 as Shard B was the only shard not to return the term and the document count of the last term it did return was 15.
|
|
|
+The actual document count of Product C was 54 so the document count was only actually off by 4 even though the worst case was that
|
|
|
+it would be off by 15. Product A, however has an error of 0 for its document count, since every shard returned it we can be confident
|
|
|
+that the count returned is accurate.
|
|
|
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+{
|
|
|
+ ...
|
|
|
"aggregations" : {
|
|
|
"products" : {
|
|
|
"doc_count_error_upper_bound" : 46,
|
|
|
+ "sum_other_doc_count" : 79,
|
|
|
"buckets" : [
|
|
|
{
|
|
|
"key" : "Product A",
|
|
@@ -224,13 +300,15 @@ returned is accurate.
|
|
|
"key" : "Product Z",
|
|
|
"doc_count" : 52,
|
|
|
"doc_count_error_upper_bound" : 2
|
|
|
- },
|
|
|
+ }
|
|
|
...
|
|
|
]
|
|
|
}
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// TESTRESPONSE[s/\.\.\.//]
|
|
|
+// TESTRESPONSE[s/: (\-)?[0-9]+/: $body.$_path/]
|
|
|
|
|
|
These errors can only be calculated in this way when the terms are ordered by descending document count. When the aggregation is
|
|
|
ordered by the terms values themselves (either ascending or descending) there is no error in the document count since if a shard
|
|
@@ -257,6 +335,7 @@ Ordering the buckets by their doc `_count` in an ascending manner:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"genres" : {
|
|
@@ -268,11 +347,13 @@ Ordering the buckets by their doc `_count` in an ascending manner:
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
Ordering the buckets alphabetically by their terms in an ascending manner:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"genres" : {
|
|
@@ -284,6 +365,7 @@ Ordering the buckets alphabetically by their terms in an ascending manner:
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
deprecated[6.0.0, Use `_key` instead of `_term` to order buckets by their term]
|
|
|
|
|
@@ -291,6 +373,7 @@ Ordering the buckets by single value metrics sub-aggregation (identified by the
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"genres" : {
|
|
@@ -305,11 +388,13 @@ Ordering the buckets by single value metrics sub-aggregation (identified by the
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
Ordering the buckets by multi value metrics sub-aggregation (identified by the aggregation name):
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"genres" : {
|
|
@@ -324,6 +409,7 @@ Ordering the buckets by multi value metrics sub-aggregation (identified by the a
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
[NOTE]
|
|
|
.Pipeline aggs cannot be used for sorting
|
|
@@ -355,6 +441,7 @@ PATH = <AGG_NAME> [ <AGG_SEPARATOR>, <AGG_NAME> ]* [ <METRIC_SEP
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"countries" : {
|
|
@@ -374,6 +461,7 @@ PATH = <AGG_NAME> [ <AGG_SEPARATOR>, <AGG_NAME> ]* [ <METRIC_SEP
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
The above will sort the artist's countries buckets based on the average play count among the rock songs.
|
|
|
|
|
@@ -381,6 +469,7 @@ Multiple criteria can be used to order the buckets by providing an array of orde
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"countries" : {
|
|
@@ -390,7 +479,7 @@ Multiple criteria can be used to order the buckets by providing an array of orde
|
|
|
},
|
|
|
"aggs" : {
|
|
|
"rock" : {
|
|
|
- "filter" : { "term" : { "genre" : { "rock" }}},
|
|
|
+ "filter" : { "term" : { "genre" : "rock" }},
|
|
|
"aggs" : {
|
|
|
"playback_stats" : { "stats" : { "field" : "play_count" }}
|
|
|
}
|
|
@@ -400,6 +489,7 @@ Multiple criteria can be used to order the buckets by providing an array of orde
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
The above will sort the artist's countries buckets based on the average play count among the rock songs and then by
|
|
|
their `doc_count` in descending order.
|
|
@@ -413,6 +503,7 @@ It is possible to only return terms that match more than a configured number of
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"tags" : {
|
|
@@ -424,6 +515,7 @@ It is possible to only return terms that match more than a configured number of
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
The above aggregation would only return tags which have been found in 10 hits or more. Default value is `1`.
|
|
|
|
|
@@ -453,6 +545,7 @@ Generating the terms using a script:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"genres" : {
|
|
@@ -466,11 +559,29 @@ Generating the terms using a script:
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
This will interpret the `script` parameter as an `inline` script with the default script language and no script parameters. To use a stored script use the following syntax:
|
|
|
|
|
|
+//////////////////////////
|
|
|
+
|
|
|
+[source,js]
|
|
|
+--------------------------------------------------
|
|
|
+POST /_scripts/my_script
|
|
|
+{
|
|
|
+ "script": {
|
|
|
+ "lang": "painless",
|
|
|
+ "source": "doc[params.field].value"
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
+
|
|
|
+//////////////////////////
|
|
|
+
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"genres" : {
|
|
@@ -486,18 +597,21 @@ This will interpret the `script` parameter as an `inline` script with the defaul
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
+// TEST[continued]
|
|
|
|
|
|
==== Value Script
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"genres" : {
|
|
|
"terms" : {
|
|
|
"field" : "gender",
|
|
|
"script" : {
|
|
|
- "source" : "'Genre: ' +_value"
|
|
|
+ "source" : "'Genre: ' +_value",
|
|
|
"lang" : "painless"
|
|
|
}
|
|
|
}
|
|
@@ -505,7 +619,7 @@ This will interpret the `script` parameter as an `inline` script with the defaul
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
-
|
|
|
+// CONSOLE
|
|
|
|
|
|
==== Filtering Values
|
|
|
|
|
@@ -517,6 +631,7 @@ It is possible to filter the values for which buckets will be created. This can
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"tags" : {
|
|
@@ -529,6 +644,7 @@ It is possible to filter the values for which buckets will be created. This can
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
In the above example, buckets will be created for all the tags that has the word `sport` in them, except those starting
|
|
|
with `water_` (so the tag `water_sports` will no be aggregated). The `include` regular expression will determine what
|
|
@@ -544,6 +660,7 @@ strings that represent the terms as they are found in the index:
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"JapaneseCars" : {
|
|
@@ -561,6 +678,7 @@ strings that represent the terms as they are found in the index:
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
===== Filtering Values with partitions
|
|
|
|
|
@@ -572,6 +690,7 @@ Consider this request which is looking for accounts that have not logged any acc
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"size": 0,
|
|
|
"aggs": {
|
|
@@ -598,6 +717,7 @@ Consider this request which is looking for accounts that have not logged any acc
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
This request is finding the last logged access date for a subset of customer accounts because we
|
|
|
might want to expire some customer accounts who haven't been seen for a long while.
|
|
@@ -662,6 +782,7 @@ An example problem scenario is querying a movie database for the 10 most popular
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"actors" : {
|
|
@@ -681,6 +802,7 @@ An example problem scenario is querying a movie database for the 10 most popular
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
Even though the number of actors may be comparatively small and we want only 50 result buckets there is a combinatorial explosion of buckets
|
|
|
during calculation - a single actor can produce n² buckets where n is the number of actors. The sane option would be to first determine
|
|
@@ -692,6 +814,7 @@ It is possible to override the default heuristic and to provide a collect mode d
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"actors" : {
|
|
@@ -712,6 +835,7 @@ It is possible to override the default heuristic and to provide a collect mode d
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
<1> the possible values are `breadth_first` and `depth_first`
|
|
|
|
|
@@ -742,6 +866,7 @@ ordinals.
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"tags" : {
|
|
@@ -753,6 +878,7 @@ ordinals.
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
<1> The possible values are `map`, `global_ordinals`
|
|
|
|
|
@@ -766,6 +892,7 @@ had a value.
|
|
|
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
+GET /_search
|
|
|
{
|
|
|
"aggs" : {
|
|
|
"tags" : {
|
|
@@ -777,6 +904,7 @@ had a value.
|
|
|
}
|
|
|
}
|
|
|
--------------------------------------------------
|
|
|
+// CONSOLE
|
|
|
|
|
|
<1> Documents without a value in the `tags` field will fall into the same bucket as documents that have the value `N/A`.
|
|
|
|