bucket-correlation-aggregation.asciidoc 8.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321
  1. [role="xpack"]
  2. [[search-aggregations-bucket-correlation-aggregation]]
  3. === Bucket correlation aggregation
  4. ++++
  5. <titleabbrev>Bucket correlation</titleabbrev>
  6. ++++
  7. A sibling pipeline aggregation which executes a correlation function on the
  8. configured sibling multi-bucket aggregation.
  9. [[bucket-correlation-agg-syntax]]
  10. ==== Parameters
  11. `buckets_path`::
  12. (Required, string)
  13. Path to the buckets that contain one set of values to correlate.
  14. For syntax, see <<buckets-path-syntax>>.
  15. `function`::
  16. (Required, object)
  17. The correlation function to execute.
  18. +
  19. .Properties of `function`
  20. [%collapsible%open]
  21. ====
  22. `count_correlation`:::
  23. (Required^*^, object)
  24. The configuration to calculate a count correlation. This function is designed for
  25. determining the correlation of a term value and a given metric. Consequently, it
  26. needs to meet the following requirements.
  27. +
  28. --
  29. * The `buckets_path` must point to a `_count` metric.
  30. * The total count of all the `bucket_path` count values must be less than or equal to `indicator.doc_count`.
  31. * When utilizing this function, an initial calculation to gather the required `indicator` values is required.
  32. --
  33. +
  34. .Properties of `count_correlation`
  35. [%collapsible%open]
  36. =====
  37. `indicator`:::
  38. (Required, object)
  39. The indicator with which to correlate the configured `bucket_path` values.
  40. +
  41. .Properties of `indicator`
  42. [%collapsible%open]
  43. ======
  44. `doc_count`:::
  45. (Required, integer)
  46. The total number of documents that initially created the `expectations`. It's required to be greater than or equal to the sum
  47. of all values in the `buckets_path` as this is the originating superset of data to which the term values are correlated.
  48. `expectations`:::
  49. (Required, array)
  50. An array of numbers with which to correlate the configured `bucket_path` values. The length of this value must always equal
  51. the number of buckets returned by the `bucket_path`.
  52. `fractions`:::
  53. (Optional, array)
  54. An array of fractions to use when averaging and calculating variance. This should be used if the pre-calculated data and the
  55. `buckets_path` have known gaps. The length of `fractions`, if provided, must equal `expectations`.
  56. ======
  57. =====
  58. ====
  59. ==== Syntax
  60. A `bucket_correlation` aggregation looks like this in isolation:
  61. [source,js]
  62. --------------------------------------------------
  63. {
  64. "bucket_correlation": {
  65. "buckets_path": "range_values>_count", <1>
  66. "function": {
  67. "count_correlation": { <2>
  68. "indicator": {
  69. "expectations": [...],
  70. "doc_count": 10000
  71. }
  72. }
  73. }
  74. }
  75. }
  76. --------------------------------------------------
  77. // NOTCONSOLE
  78. <1> The buckets containing the values to correlate against.
  79. <2> The correlation function definition.
  80. [[bucket-correlation-agg-example]]
  81. ==== Example
  82. The following snippet correlates the individual terms in the field `version` with the `latency` metric. Not shown
  83. is the pre-calculation of the `latency` indicator values, which was done utilizing the
  84. <<search-aggregations-metrics-percentile-aggregation,percentiles>> aggregation.
  85. This example is only using the 10s percentiles.
  86. [source,console]
  87. -------------------------------------------------
  88. POST correlate_latency/_search?size=0&filter_path=aggregations
  89. {
  90. "aggs": {
  91. "buckets": {
  92. "terms": { <1>
  93. "field": "version",
  94. "size": 2
  95. },
  96. "aggs": {
  97. "latency_ranges": {
  98. "range": { <2>
  99. "field": "latency",
  100. "ranges": [
  101. { "to": 0.0 },
  102. { "from": 0, "to": 105 },
  103. { "from": 105, "to": 225 },
  104. { "from": 225, "to": 445 },
  105. { "from": 445, "to": 665 },
  106. { "from": 665, "to": 885 },
  107. { "from": 885, "to": 1115 },
  108. { "from": 1115, "to": 1335 },
  109. { "from": 1335, "to": 1555 },
  110. { "from": 1555, "to": 1775 },
  111. { "from": 1775 }
  112. ]
  113. }
  114. },
  115. "bucket_correlation": { <3>
  116. "bucket_correlation": {
  117. "buckets_path": "latency_ranges>_count",
  118. "function": {
  119. "count_correlation": {
  120. "indicator": {
  121. "expectations": [0, 52.5, 165, 335, 555, 775, 1000, 1225, 1445, 1665, 1775],
  122. "doc_count": 200
  123. }
  124. }
  125. }
  126. }
  127. }
  128. }
  129. }
  130. }
  131. }
  132. -------------------------------------------------
  133. // TEST[setup:correlate_latency]
  134. <1> The term buckets containing a range aggregation and the bucket correlation aggregation. Both are utilized to calculate
  135. the correlation of the term values with the latency.
  136. <2> The range aggregation on the latency field. The ranges were created referencing the percentiles of the latency field.
  137. <3> The bucket correlation aggregation that calculates the correlation of the number of term values within each range
  138. and the previously calculated indicator values.
  139. And the following may be the response:
  140. [source,console-result]
  141. ----
  142. {
  143. "aggregations" : {
  144. "buckets" : {
  145. "doc_count_error_upper_bound" : 0,
  146. "sum_other_doc_count" : 0,
  147. "buckets" : [
  148. {
  149. "key" : "1.0",
  150. "doc_count" : 100,
  151. "latency_ranges" : {
  152. "buckets" : [
  153. {
  154. "key" : "*-0.0",
  155. "to" : 0.0,
  156. "doc_count" : 0
  157. },
  158. {
  159. "key" : "0.0-105.0",
  160. "from" : 0.0,
  161. "to" : 105.0,
  162. "doc_count" : 1
  163. },
  164. {
  165. "key" : "105.0-225.0",
  166. "from" : 105.0,
  167. "to" : 225.0,
  168. "doc_count" : 9
  169. },
  170. {
  171. "key" : "225.0-445.0",
  172. "from" : 225.0,
  173. "to" : 445.0,
  174. "doc_count" : 0
  175. },
  176. {
  177. "key" : "445.0-665.0",
  178. "from" : 445.0,
  179. "to" : 665.0,
  180. "doc_count" : 0
  181. },
  182. {
  183. "key" : "665.0-885.0",
  184. "from" : 665.0,
  185. "to" : 885.0,
  186. "doc_count" : 0
  187. },
  188. {
  189. "key" : "885.0-1115.0",
  190. "from" : 885.0,
  191. "to" : 1115.0,
  192. "doc_count" : 10
  193. },
  194. {
  195. "key" : "1115.0-1335.0",
  196. "from" : 1115.0,
  197. "to" : 1335.0,
  198. "doc_count" : 20
  199. },
  200. {
  201. "key" : "1335.0-1555.0",
  202. "from" : 1335.0,
  203. "to" : 1555.0,
  204. "doc_count" : 20
  205. },
  206. {
  207. "key" : "1555.0-1775.0",
  208. "from" : 1555.0,
  209. "to" : 1775.0,
  210. "doc_count" : 20
  211. },
  212. {
  213. "key" : "1775.0-*",
  214. "from" : 1775.0,
  215. "doc_count" : 20
  216. }
  217. ]
  218. },
  219. "bucket_correlation" : {
  220. "value" : 0.8402398981360937
  221. }
  222. },
  223. {
  224. "key" : "2.0",
  225. "doc_count" : 100,
  226. "latency_ranges" : {
  227. "buckets" : [
  228. {
  229. "key" : "*-0.0",
  230. "to" : 0.0,
  231. "doc_count" : 0
  232. },
  233. {
  234. "key" : "0.0-105.0",
  235. "from" : 0.0,
  236. "to" : 105.0,
  237. "doc_count" : 19
  238. },
  239. {
  240. "key" : "105.0-225.0",
  241. "from" : 105.0,
  242. "to" : 225.0,
  243. "doc_count" : 11
  244. },
  245. {
  246. "key" : "225.0-445.0",
  247. "from" : 225.0,
  248. "to" : 445.0,
  249. "doc_count" : 20
  250. },
  251. {
  252. "key" : "445.0-665.0",
  253. "from" : 445.0,
  254. "to" : 665.0,
  255. "doc_count" : 20
  256. },
  257. {
  258. "key" : "665.0-885.0",
  259. "from" : 665.0,
  260. "to" : 885.0,
  261. "doc_count" : 20
  262. },
  263. {
  264. "key" : "885.0-1115.0",
  265. "from" : 885.0,
  266. "to" : 1115.0,
  267. "doc_count" : 10
  268. },
  269. {
  270. "key" : "1115.0-1335.0",
  271. "from" : 1115.0,
  272. "to" : 1335.0,
  273. "doc_count" : 0
  274. },
  275. {
  276. "key" : "1335.0-1555.0",
  277. "from" : 1335.0,
  278. "to" : 1555.0,
  279. "doc_count" : 0
  280. },
  281. {
  282. "key" : "1555.0-1775.0",
  283. "from" : 1555.0,
  284. "to" : 1775.0,
  285. "doc_count" : 0
  286. },
  287. {
  288. "key" : "1775.0-*",
  289. "from" : 1775.0,
  290. "doc_count" : 0
  291. }
  292. ]
  293. },
  294. "bucket_correlation" : {
  295. "value" : -0.5759855613334943
  296. }
  297. }
  298. ]
  299. }
  300. }
  301. }
  302. ----