bucket-correlation-aggregation.asciidoc 8.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318
  1. [role="xpack"]
  2. [[search-aggregations-bucket-correlation-aggregation]]
  3. === Bucket correlation aggregation
  4. ++++
  5. <titleabbrev>Bucket correlation</titleabbrev>
  6. ++++
  7. experimental::[]
  8. A sibling pipeline aggregation which executes a correlation function on the
  9. configured sibling multi-bucket aggregation.
  10. [[bucket-correlation-agg-syntax]]
  11. ==== Parameters
  12. `buckets_path`::
  13. (Required, string)
  14. Path to the buckets that contain one set of values to correlate.
  15. For syntax, see <<buckets-path-syntax>>.
  16. `function`::
  17. (Required, object)
  18. The correlation function to execute.
  19. +
  20. .Properties of `function`
  21. [%collapsible%open]
  22. ====
  23. `count_correlation`:::
  24. (Required^*^, object)
  25. The configuration to calculate a count correlation. This function is designed for
  26. determining the correlation of a term value and a given metric. Consequently, it
  27. needs to meet the following requirements.
  28. * The `buckets_path` must point to a `_count` metric.
  29. * The total count of all the `bucket_path` count values must be less than or equal to `indicator.doc_count`.
  30. * When utilizing this function, an initial calculation to gather the required `indicator` values is required.
  31. .Properties of `count_correlation`
  32. [%collapsible%open]
  33. =====
  34. `indicator`:::
  35. (Required, object)
  36. The indicator with which to correlate the configured `bucket_path` values.
  37. .Properties of `indicator`
  38. [%collapsible%open]
  39. =====
  40. `expectations`:::
  41. (Required, array)
  42. An array of numbers with which to correlate the configured `bucket_path` values. The length of this value must always equal
  43. the number of buckets returned by the `bucket_path`.
  44. `fractions`:::
  45. (Optional, array)
  46. An array of fractions to use when averaging and calculating variance. This should be used if the pre-calculated data and the
  47. `buckets_path` have known gaps. The length of `fractions`, if provided, must equal `expectations`.
  48. `doc_count`:::
  49. (Required, integer)
  50. The total number of documents that initially created the `expectations`. It's required to be greater than or equal to the sum
  51. of all values in the `buckets_path` as this is the originating superset of data to which the term values are correlated.
  52. =====
  53. =====
  54. ====
  55. ==== Syntax
  56. A `bucket_correlation` aggregation looks like this in isolation:
  57. [source,js]
  58. --------------------------------------------------
  59. {
  60. "bucket_correlation": {
  61. "buckets_path": "range_values>_count", <1>
  62. "function": {
  63. "count_correlation": { <2>
  64. "expectations": [...],
  65. "doc_count": 10000
  66. }
  67. }
  68. }
  69. }
  70. --------------------------------------------------
  71. // NOTCONSOLE
  72. <1> The buckets containing the values to correlate against.
  73. <2> The correlation function definition.
  74. [[bucket-correlation-agg-example]]
  75. ==== Example
  76. The following snippet correlates the individual terms in the field `version` with the `latency` metric. Not shown
  77. is the pre-calculation of the `latency` indicator values, which was done utilizing the
  78. <<search-aggregations-metrics-percentile-aggregation,percentiles>> aggregation.
  79. This example is only using the 10s percentiles.
  80. [source,console]
  81. -------------------------------------------------
  82. POST correlate_latency/_search?size=0&filter_path=aggregations
  83. {
  84. "aggs": {
  85. "buckets": {
  86. "terms": { <1>
  87. "field": "version",
  88. "size": 2
  89. },
  90. "aggs": {
  91. "latency_ranges": {
  92. "range": { <2>
  93. "field": "latency",
  94. "ranges": [
  95. { "to": 0.0 },
  96. { "from": 0, "to": 105 },
  97. { "from": 105, "to": 225 },
  98. { "from": 225, "to": 445 },
  99. { "from": 445, "to": 665 },
  100. { "from": 665, "to": 885 },
  101. { "from": 885, "to": 1115 },
  102. { "from": 1115, "to": 1335 },
  103. { "from": 1335, "to": 1555 },
  104. { "from": 1555, "to": 1775 },
  105. { "from": 1775 }
  106. ]
  107. }
  108. },
  109. "bucket_correlation": { <3>
  110. "bucket_correlation": {
  111. "buckets_path": "latency_ranges>_count",
  112. "function": {
  113. "count_correlation": {
  114. "indicator": {
  115. "expectations": [0, 52.5, 165, 335, 555, 775, 1000, 1225, 1445, 1665, 1775],
  116. "doc_count": 200
  117. }
  118. }
  119. }
  120. }
  121. }
  122. }
  123. }
  124. }
  125. }
  126. -------------------------------------------------
  127. // TEST[setup:correlate_latency]
  128. <1> The term buckets containing a range aggregation and the bucket correlation aggregation. Both are utilized to calculate
  129. the correlation of the term values with the latency.
  130. <2> The range aggregation on the latency field. The ranges were created referencing the percentiles of the latency field.
  131. <3> The bucket correlation aggregation that calculates the correlation of the number of term values within each range
  132. and the previously calculated indicator values.
  133. And the following may be the response:
  134. [source,console-result]
  135. ----
  136. {
  137. "aggregations" : {
  138. "buckets" : {
  139. "doc_count_error_upper_bound" : 0,
  140. "sum_other_doc_count" : 0,
  141. "buckets" : [
  142. {
  143. "key" : "1.0",
  144. "doc_count" : 100,
  145. "latency_ranges" : {
  146. "buckets" : [
  147. {
  148. "key" : "*-0.0",
  149. "to" : 0.0,
  150. "doc_count" : 0
  151. },
  152. {
  153. "key" : "0.0-105.0",
  154. "from" : 0.0,
  155. "to" : 105.0,
  156. "doc_count" : 1
  157. },
  158. {
  159. "key" : "105.0-225.0",
  160. "from" : 105.0,
  161. "to" : 225.0,
  162. "doc_count" : 9
  163. },
  164. {
  165. "key" : "225.0-445.0",
  166. "from" : 225.0,
  167. "to" : 445.0,
  168. "doc_count" : 0
  169. },
  170. {
  171. "key" : "445.0-665.0",
  172. "from" : 445.0,
  173. "to" : 665.0,
  174. "doc_count" : 0
  175. },
  176. {
  177. "key" : "665.0-885.0",
  178. "from" : 665.0,
  179. "to" : 885.0,
  180. "doc_count" : 0
  181. },
  182. {
  183. "key" : "885.0-1115.0",
  184. "from" : 885.0,
  185. "to" : 1115.0,
  186. "doc_count" : 10
  187. },
  188. {
  189. "key" : "1115.0-1335.0",
  190. "from" : 1115.0,
  191. "to" : 1335.0,
  192. "doc_count" : 20
  193. },
  194. {
  195. "key" : "1335.0-1555.0",
  196. "from" : 1335.0,
  197. "to" : 1555.0,
  198. "doc_count" : 20
  199. },
  200. {
  201. "key" : "1555.0-1775.0",
  202. "from" : 1555.0,
  203. "to" : 1775.0,
  204. "doc_count" : 20
  205. },
  206. {
  207. "key" : "1775.0-*",
  208. "from" : 1775.0,
  209. "doc_count" : 20
  210. }
  211. ]
  212. },
  213. "bucket_correlation" : {
  214. "value" : 0.8402398981360937
  215. }
  216. },
  217. {
  218. "key" : "2.0",
  219. "doc_count" : 100,
  220. "latency_ranges" : {
  221. "buckets" : [
  222. {
  223. "key" : "*-0.0",
  224. "to" : 0.0,
  225. "doc_count" : 0
  226. },
  227. {
  228. "key" : "0.0-105.0",
  229. "from" : 0.0,
  230. "to" : 105.0,
  231. "doc_count" : 19
  232. },
  233. {
  234. "key" : "105.0-225.0",
  235. "from" : 105.0,
  236. "to" : 225.0,
  237. "doc_count" : 11
  238. },
  239. {
  240. "key" : "225.0-445.0",
  241. "from" : 225.0,
  242. "to" : 445.0,
  243. "doc_count" : 20
  244. },
  245. {
  246. "key" : "445.0-665.0",
  247. "from" : 445.0,
  248. "to" : 665.0,
  249. "doc_count" : 20
  250. },
  251. {
  252. "key" : "665.0-885.0",
  253. "from" : 665.0,
  254. "to" : 885.0,
  255. "doc_count" : 20
  256. },
  257. {
  258. "key" : "885.0-1115.0",
  259. "from" : 885.0,
  260. "to" : 1115.0,
  261. "doc_count" : 10
  262. },
  263. {
  264. "key" : "1115.0-1335.0",
  265. "from" : 1115.0,
  266. "to" : 1335.0,
  267. "doc_count" : 0
  268. },
  269. {
  270. "key" : "1335.0-1555.0",
  271. "from" : 1335.0,
  272. "to" : 1555.0,
  273. "doc_count" : 0
  274. },
  275. {
  276. "key" : "1555.0-1775.0",
  277. "from" : 1555.0,
  278. "to" : 1775.0,
  279. "doc_count" : 0
  280. },
  281. {
  282. "key" : "1775.0-*",
  283. "from" : 1775.0,
  284. "doc_count" : 0
  285. }
  286. ]
  287. },
  288. "bucket_correlation" : {
  289. "value" : -0.5759855613334943
  290. }
  291. }
  292. ]
  293. }
  294. }
  295. }
  296. ----