percentiles-bucket-aggregation.asciidoc 4.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131
  1. [[search-aggregations-pipeline-percentiles-bucket-aggregation]]
  2. === Percentiles Bucket Aggregation
  3. experimental[]
  4. A sibling pipeline aggregation which calculates percentiles across all bucket of a specified metric in a sibling aggregation.
  5. The specified metric must be numeric and the sibling aggregation must be a multi-bucket aggregation.
  6. ==== Syntax
  7. A `percentiles_bucket` aggregation looks like this in isolation:
  8. [source,js]
  9. --------------------------------------------------
  10. {
  11. "percentiles_bucket": {
  12. "buckets_path": "the_sum"
  13. }
  14. }
  15. --------------------------------------------------
  16. // NOTCONSOLE
  17. .`sum_bucket` Parameters
  18. |===
  19. |Parameter Name |Description |Required |Default Value
  20. |`buckets_path` |The path to the buckets we wish to find the sum for (see <<buckets-path-syntax>> for more
  21. details) |Required |
  22. |`gap_policy` |The policy to apply when gaps are found in the data (see <<gap-policy>> for more
  23. details)|Optional | `skip`
  24. |`format` |format to apply to the output value of this aggregation |Optional | `null`
  25. |`percents` |The list of percentiles to calculate |Optional | `[ 1, 5, 25, 50, 75, 95, 99 ]`
  26. |===
  27. The following snippet calculates the percentiles for the total monthly `sales` buckets:
  28. [source,js]
  29. --------------------------------------------------
  30. POST /sales/_search
  31. {
  32. "size": 0,
  33. "aggs" : {
  34. "sales_per_month" : {
  35. "date_histogram" : {
  36. "field" : "date",
  37. "interval" : "month"
  38. },
  39. "aggs": {
  40. "sales": {
  41. "sum": {
  42. "field": "price"
  43. }
  44. }
  45. }
  46. },
  47. "percentiles_monthly_sales": {
  48. "percentiles_bucket": {
  49. "buckets_path": "sales_per_month>sales", <1>
  50. "percents": [ 25.0, 50.0, 75.0 ] <2>
  51. }
  52. }
  53. }
  54. }
  55. --------------------------------------------------
  56. // CONSOLE
  57. // TEST[setup:sales]
  58. <1> `buckets_path` instructs this percentiles_bucket aggregation that we want to calculate percentiles for
  59. the `sales` aggregation in the `sales_per_month` date histogram.
  60. <2> `percents` specifies which percentiles we wish to calculate, in this case, the 25th, 50th and 75th percentiles.
  61. And the following may be the response:
  62. [source,js]
  63. --------------------------------------------------
  64. {
  65. "took": 11,
  66. "timed_out": false,
  67. "_shards": ...,
  68. "hits": ...,
  69. "aggregations": {
  70. "sales_per_month": {
  71. "buckets": [
  72. {
  73. "key_as_string": "2015/01/01 00:00:00",
  74. "key": 1420070400000,
  75. "doc_count": 3,
  76. "sales": {
  77. "value": 550.0
  78. }
  79. },
  80. {
  81. "key_as_string": "2015/02/01 00:00:00",
  82. "key": 1422748800000,
  83. "doc_count": 2,
  84. "sales": {
  85. "value": 60.0
  86. }
  87. },
  88. {
  89. "key_as_string": "2015/03/01 00:00:00",
  90. "key": 1425168000000,
  91. "doc_count": 2,
  92. "sales": {
  93. "value": 375.0
  94. }
  95. }
  96. ]
  97. },
  98. "percentiles_monthly_sales": {
  99. "values" : {
  100. "25.0": 375.0,
  101. "50.0": 375.0,
  102. "75.0": 550.0
  103. }
  104. }
  105. }
  106. }
  107. --------------------------------------------------
  108. // TESTRESPONSE[s/"took": 11/"took": $body.took/]
  109. // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
  110. // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
  111. ==== Percentiles_bucket implementation
  112. The Percentile Bucket returns the nearest input data point that is not greater than the requested percentile; it does not
  113. interpolate between data points.
  114. The percentiles are calculated exactly and is not an approximation (unlike the Percentiles Metric). This means
  115. the implementation maintains an in-memory, sorted list of your data to compute the percentiles, before discarding the
  116. data. You may run into memory pressure issues if you attempt to calculate percentiles over many millions of
  117. data-points in a single `percentiles_bucket`.