bucket-correlation-aggregation.asciidoc 8.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[search-aggregations-bucket-correlation-aggregation]]
  4. === Bucket correlation aggregation
  5. ++++
  6. <titleabbrev>Bucket correlation aggregation</titleabbrev>
  7. ++++
  8. experimental::[]
  9. A sibling pipeline aggregation which executes a correlation function on the
  10. configured sibling multi-bucket aggregation.
  11. [[bucket-correlation-agg-syntax]]
  12. ==== Parameters
  13. `buckets_path`::
  14. (Required, string)
  15. Path to the buckets that contain one set of values to correlate.
  16. For syntax, see <<buckets-path-syntax>>.
  17. `function`::
  18. (Required, object)
  19. The correlation function to execute.
  20. +
  21. .Properties of `function`
  22. [%collapsible%open]
  23. ====
  24. `count_correlation`:::
  25. (Required^*^, object)
  26. The configuration to calculate a count correlation. This function is designed for
  27. determining the correlation of a term value and a given metric. Consequently, it
  28. needs to meet the following requirements.
  29. * The `buckets_path` must point to a `_count` metric.
  30. * The total count of all the `bucket_path` count values must be less than or equal to `indicator.doc_count`.
  31. * When utilizing this function, an initial calculation to gather the required `indicator` values is required.
  32. .Properties of `count_correlation`
  33. [%collapsible%open]
  34. =====
  35. `indicator`:::
  36. (Required, object)
  37. The indicator with which to correlate the configured `bucket_path` values.
  38. .Properties of `indicator`
  39. [%collapsible%open]
  40. =====
  41. `expectations`:::
  42. (Required, array)
  43. An array of numbers with which to correlate the configured `bucket_path` values. The length of this value must always equal
  44. the number of buckets returned by the `bucket_path`.
  45. `fractions`:::
  46. (Optional, array)
  47. An array of fractions to use when averaging and calculating variance. This should be used if the pre-calculated data and the
  48. `buckets_path` have known gaps. The length of `fractions`, if provided, must equal `expectations`.
  49. `doc_count`:::
  50. (Required, integer)
  51. The total number of documents that initially created the `expectations`. It's required to be greater than or equal to the sum
  52. of all values in the `buckets_path` as this is the originating superset of data to which the term values are correlated.
  53. =====
  54. =====
  55. ====
  56. ==== Syntax
  57. A `bucket_correlation` aggregation looks like this in isolation:
  58. [source,js]
  59. --------------------------------------------------
  60. {
  61. "bucket_correlation": {
  62. "buckets_path": "range_values>_count", <1>
  63. "function": {
  64. "count_correlation": { <2>
  65. "expectations": [...],
  66. "doc_count": 10000
  67. }
  68. }
  69. }
  70. }
  71. --------------------------------------------------
  72. // NOTCONSOLE
  73. <1> The buckets containing the values to correlate against.
  74. <2> The correlation function definition.
  75. [[bucket-correlation-agg-example]]
  76. ==== Example
  77. The following snippet correlates the individual terms in the field `version` with the `latency` metric. Not shown
  78. is the pre-calculation of the `latency` indicator values, which was done utilizing the
  79. <<search-aggregations-metrics-percentile-aggregation,percentiles>> aggregation.
  80. This example is only using the 10s percentiles.
  81. [source,console]
  82. -------------------------------------------------
  83. POST correlate_latency/_search?size=0&filter_path=aggregations
  84. {
  85. "aggs": {
  86. "buckets": {
  87. "terms": {
  88. "field": "version",
  89. "size": 2
  90. },
  91. "aggs": {
  92. "latency_ranges": {
  93. "range": {
  94. "field": "latency",
  95. "ranges": [
  96. { "to": 0.0 },
  97. { "from": 0, "to": 105 },
  98. { "from": 105, "to": 225 },
  99. { "from": 225, "to": 445 },
  100. { "from": 445, "to": 665 },
  101. { "from": 665, "to": 885 },
  102. { "from": 885, "to": 1115 },
  103. { "from": 1115, "to": 1335 },
  104. { "from": 1335, "to": 1555 },
  105. { "from": 1555, "to": 1775 },
  106. { "from": 1775 }
  107. ]
  108. }
  109. },
  110. "bucket_correlation": {
  111. "bucket_correlation": {
  112. "buckets_path": "latency_ranges>_count",
  113. "function": {
  114. "count_correlation": {
  115. "indicator": {
  116. "expectations": [0, 52.5, 165, 335, 555, 775, 1000, 1225, 1445, 1665, 1775],
  117. "doc_count": 200
  118. }
  119. }
  120. }
  121. }
  122. }
  123. }
  124. }
  125. }
  126. }
  127. -------------------------------------------------
  128. // TEST[setup:correlate_latency]
  129. <1> The term buckets containing a range aggregation and the bucket correlation aggregation. Both are utilized to calculate
  130. the correlation of the term values with the latency.
  131. <2> The range aggregation on the latency field. The ranges were created referencing the percentiles of the latency field.
  132. <3> The bucket correlation aggregation that calculates the correlation of the number of term values within each range
  133. and the previously calculated indicator values.
  134. And the following may be the response:
  135. [source,console-result]
  136. ----
  137. {
  138. "aggregations" : {
  139. "buckets" : {
  140. "doc_count_error_upper_bound" : 0,
  141. "sum_other_doc_count" : 0,
  142. "buckets" : [
  143. {
  144. "key" : "1.0",
  145. "doc_count" : 100,
  146. "latency_ranges" : {
  147. "buckets" : [
  148. {
  149. "key" : "*-0.0",
  150. "to" : 0.0,
  151. "doc_count" : 0
  152. },
  153. {
  154. "key" : "0.0-105.0",
  155. "from" : 0.0,
  156. "to" : 105.0,
  157. "doc_count" : 1
  158. },
  159. {
  160. "key" : "105.0-225.0",
  161. "from" : 105.0,
  162. "to" : 225.0,
  163. "doc_count" : 9
  164. },
  165. {
  166. "key" : "225.0-445.0",
  167. "from" : 225.0,
  168. "to" : 445.0,
  169. "doc_count" : 0
  170. },
  171. {
  172. "key" : "445.0-665.0",
  173. "from" : 445.0,
  174. "to" : 665.0,
  175. "doc_count" : 0
  176. },
  177. {
  178. "key" : "665.0-885.0",
  179. "from" : 665.0,
  180. "to" : 885.0,
  181. "doc_count" : 0
  182. },
  183. {
  184. "key" : "885.0-1115.0",
  185. "from" : 885.0,
  186. "to" : 1115.0,
  187. "doc_count" : 10
  188. },
  189. {
  190. "key" : "1115.0-1335.0",
  191. "from" : 1115.0,
  192. "to" : 1335.0,
  193. "doc_count" : 20
  194. },
  195. {
  196. "key" : "1335.0-1555.0",
  197. "from" : 1335.0,
  198. "to" : 1555.0,
  199. "doc_count" : 20
  200. },
  201. {
  202. "key" : "1555.0-1775.0",
  203. "from" : 1555.0,
  204. "to" : 1775.0,
  205. "doc_count" : 20
  206. },
  207. {
  208. "key" : "1775.0-*",
  209. "from" : 1775.0,
  210. "doc_count" : 20
  211. }
  212. ]
  213. },
  214. "bucket_correlation" : {
  215. "value" : 0.8402398981360937
  216. }
  217. },
  218. {
  219. "key" : "2.0",
  220. "doc_count" : 100,
  221. "latency_ranges" : {
  222. "buckets" : [
  223. {
  224. "key" : "*-0.0",
  225. "to" : 0.0,
  226. "doc_count" : 0
  227. },
  228. {
  229. "key" : "0.0-105.0",
  230. "from" : 0.0,
  231. "to" : 105.0,
  232. "doc_count" : 19
  233. },
  234. {
  235. "key" : "105.0-225.0",
  236. "from" : 105.0,
  237. "to" : 225.0,
  238. "doc_count" : 11
  239. },
  240. {
  241. "key" : "225.0-445.0",
  242. "from" : 225.0,
  243. "to" : 445.0,
  244. "doc_count" : 20
  245. },
  246. {
  247. "key" : "445.0-665.0",
  248. "from" : 445.0,
  249. "to" : 665.0,
  250. "doc_count" : 20
  251. },
  252. {
  253. "key" : "665.0-885.0",
  254. "from" : 665.0,
  255. "to" : 885.0,
  256. "doc_count" : 20
  257. },
  258. {
  259. "key" : "885.0-1115.0",
  260. "from" : 885.0,
  261. "to" : 1115.0,
  262. "doc_count" : 10
  263. },
  264. {
  265. "key" : "1115.0-1335.0",
  266. "from" : 1115.0,
  267. "to" : 1335.0,
  268. "doc_count" : 0
  269. },
  270. {
  271. "key" : "1335.0-1555.0",
  272. "from" : 1335.0,
  273. "to" : 1555.0,
  274. "doc_count" : 0
  275. },
  276. {
  277. "key" : "1555.0-1775.0",
  278. "from" : 1555.0,
  279. "to" : 1775.0,
  280. "doc_count" : 0
  281. },
  282. {
  283. "key" : "1775.0-*",
  284. "from" : 1775.0,
  285. "doc_count" : 0
  286. }
  287. ]
  288. },
  289. "bucket_correlation" : {
  290. "value" : -0.5759855613334943
  291. }
  292. }
  293. ]
  294. }
  295. }
  296. }
  297. ----