123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318 |
- [role="xpack"]
- [[search-aggregations-bucket-correlation-aggregation]]
- === Bucket correlation aggregation
- ++++
- <titleabbrev>Bucket correlation</titleabbrev>
- ++++
- experimental::[]
- A sibling pipeline aggregation which executes a correlation function on the
- configured sibling multi-bucket aggregation.
- [[bucket-correlation-agg-syntax]]
- ==== Parameters
- `buckets_path`::
- (Required, string)
- Path to the buckets that contain one set of values to correlate.
- For syntax, see <<buckets-path-syntax>>.
- `function`::
- (Required, object)
- The correlation function to execute.
- +
- .Properties of `function`
- [%collapsible%open]
- ====
- `count_correlation`:::
- (Required^*^, object)
- The configuration to calculate a count correlation. This function is designed for
- determining the correlation of a term value and a given metric. Consequently, it
- needs to meet the following requirements.
- * The `buckets_path` must point to a `_count` metric.
- * The total count of all the `bucket_path` count values must be less than or equal to `indicator.doc_count`.
- * When utilizing this function, an initial calculation to gather the required `indicator` values is required.
- .Properties of `count_correlation`
- [%collapsible%open]
- =====
- `indicator`:::
- (Required, object)
- The indicator with which to correlate the configured `bucket_path` values.
- .Properties of `indicator`
- [%collapsible%open]
- =====
- `expectations`:::
- (Required, array)
- An array of numbers with which to correlate the configured `bucket_path` values. The length of this value must always equal
- the number of buckets returned by the `bucket_path`.
- `fractions`:::
- (Optional, array)
- An array of fractions to use when averaging and calculating variance. This should be used if the pre-calculated data and the
- `buckets_path` have known gaps. The length of `fractions`, if provided, must equal `expectations`.
- `doc_count`:::
- (Required, integer)
- The total number of documents that initially created the `expectations`. It's required to be greater than or equal to the sum
- of all values in the `buckets_path` as this is the originating superset of data to which the term values are correlated.
- =====
- =====
- ====
- ==== Syntax
- A `bucket_correlation` aggregation looks like this in isolation:
- [source,js]
- --------------------------------------------------
- {
- "bucket_correlation": {
- "buckets_path": "range_values>_count", <1>
- "function": {
- "count_correlation": { <2>
- "expectations": [...],
- "doc_count": 10000
- }
- }
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- <1> The buckets containing the values to correlate against.
- <2> The correlation function definition.
- [[bucket-correlation-agg-example]]
- ==== Example
- The following snippet correlates the individual terms in the field `version` with the `latency` metric. Not shown
- is the pre-calculation of the `latency` indicator values, which was done utilizing the
- <<search-aggregations-metrics-percentile-aggregation,percentiles>> aggregation.
- This example is only using the 10s percentiles.
- [source,console]
- -------------------------------------------------
- POST correlate_latency/_search?size=0&filter_path=aggregations
- {
- "aggs": {
- "buckets": {
- "terms": { <1>
- "field": "version",
- "size": 2
- },
- "aggs": {
- "latency_ranges": {
- "range": { <2>
- "field": "latency",
- "ranges": [
- { "to": 0.0 },
- { "from": 0, "to": 105 },
- { "from": 105, "to": 225 },
- { "from": 225, "to": 445 },
- { "from": 445, "to": 665 },
- { "from": 665, "to": 885 },
- { "from": 885, "to": 1115 },
- { "from": 1115, "to": 1335 },
- { "from": 1335, "to": 1555 },
- { "from": 1555, "to": 1775 },
- { "from": 1775 }
- ]
- }
- },
- "bucket_correlation": { <3>
- "bucket_correlation": {
- "buckets_path": "latency_ranges>_count",
- "function": {
- "count_correlation": {
- "indicator": {
- "expectations": [0, 52.5, 165, 335, 555, 775, 1000, 1225, 1445, 1665, 1775],
- "doc_count": 200
- }
- }
- }
- }
- }
- }
- }
- }
- }
- -------------------------------------------------
- // TEST[setup:correlate_latency]
- <1> The term buckets containing a range aggregation and the bucket correlation aggregation. Both are utilized to calculate
- the correlation of the term values with the latency.
- <2> The range aggregation on the latency field. The ranges were created referencing the percentiles of the latency field.
- <3> The bucket correlation aggregation that calculates the correlation of the number of term values within each range
- and the previously calculated indicator values.
- And the following may be the response:
- [source,console-result]
- ----
- {
- "aggregations" : {
- "buckets" : {
- "doc_count_error_upper_bound" : 0,
- "sum_other_doc_count" : 0,
- "buckets" : [
- {
- "key" : "1.0",
- "doc_count" : 100,
- "latency_ranges" : {
- "buckets" : [
- {
- "key" : "*-0.0",
- "to" : 0.0,
- "doc_count" : 0
- },
- {
- "key" : "0.0-105.0",
- "from" : 0.0,
- "to" : 105.0,
- "doc_count" : 1
- },
- {
- "key" : "105.0-225.0",
- "from" : 105.0,
- "to" : 225.0,
- "doc_count" : 9
- },
- {
- "key" : "225.0-445.0",
- "from" : 225.0,
- "to" : 445.0,
- "doc_count" : 0
- },
- {
- "key" : "445.0-665.0",
- "from" : 445.0,
- "to" : 665.0,
- "doc_count" : 0
- },
- {
- "key" : "665.0-885.0",
- "from" : 665.0,
- "to" : 885.0,
- "doc_count" : 0
- },
- {
- "key" : "885.0-1115.0",
- "from" : 885.0,
- "to" : 1115.0,
- "doc_count" : 10
- },
- {
- "key" : "1115.0-1335.0",
- "from" : 1115.0,
- "to" : 1335.0,
- "doc_count" : 20
- },
- {
- "key" : "1335.0-1555.0",
- "from" : 1335.0,
- "to" : 1555.0,
- "doc_count" : 20
- },
- {
- "key" : "1555.0-1775.0",
- "from" : 1555.0,
- "to" : 1775.0,
- "doc_count" : 20
- },
- {
- "key" : "1775.0-*",
- "from" : 1775.0,
- "doc_count" : 20
- }
- ]
- },
- "bucket_correlation" : {
- "value" : 0.8402398981360937
- }
- },
- {
- "key" : "2.0",
- "doc_count" : 100,
- "latency_ranges" : {
- "buckets" : [
- {
- "key" : "*-0.0",
- "to" : 0.0,
- "doc_count" : 0
- },
- {
- "key" : "0.0-105.0",
- "from" : 0.0,
- "to" : 105.0,
- "doc_count" : 19
- },
- {
- "key" : "105.0-225.0",
- "from" : 105.0,
- "to" : 225.0,
- "doc_count" : 11
- },
- {
- "key" : "225.0-445.0",
- "from" : 225.0,
- "to" : 445.0,
- "doc_count" : 20
- },
- {
- "key" : "445.0-665.0",
- "from" : 445.0,
- "to" : 665.0,
- "doc_count" : 20
- },
- {
- "key" : "665.0-885.0",
- "from" : 665.0,
- "to" : 885.0,
- "doc_count" : 20
- },
- {
- "key" : "885.0-1115.0",
- "from" : 885.0,
- "to" : 1115.0,
- "doc_count" : 10
- },
- {
- "key" : "1115.0-1335.0",
- "from" : 1115.0,
- "to" : 1335.0,
- "doc_count" : 0
- },
- {
- "key" : "1335.0-1555.0",
- "from" : 1335.0,
- "to" : 1555.0,
- "doc_count" : 0
- },
- {
- "key" : "1555.0-1775.0",
- "from" : 1555.0,
- "to" : 1775.0,
- "doc_count" : 0
- },
- {
- "key" : "1775.0-*",
- "from" : 1775.0,
- "doc_count" : 0
- }
- ]
- },
- "bucket_correlation" : {
- "value" : -0.5759855613334943
- }
- }
- ]
- }
- }
- }
- ----
|