123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234 |
- [[search-aggregations-metrics-percentile-rank-aggregation]]
- === Percentile ranks aggregation
- ++++
- <titleabbrev>Percentile ranks</titleabbrev>
- ++++
- A `multi-value` metrics aggregation that calculates one or more percentile ranks
- over numeric values extracted from the aggregated documents. These values can be
- extracted from specific numeric or <<histogram,histogram fields>> in the documents.
- [NOTE]
- ==================================================
- Please see <<search-aggregations-metrics-percentile-aggregation-approximation>>
- and <<search-aggregations-metrics-percentile-aggregation-compression>> for advice
- regarding approximation and memory use of the percentile ranks aggregation
- ==================================================
- Percentile rank show the percentage of observed values which are below certain
- value. For example, if a value is greater than or equal to 95% of the observed values
- it is said to be at the 95th percentile rank.
- Assume your data consists of website load times. You may have a service agreement that
- 95% of page loads complete within 500ms and 99% of page loads complete within 600ms.
- Let's look at a range of percentiles representing load time:
- [source,console]
- --------------------------------------------------
- GET latency/_search
- {
- "size": 0,
- "aggs": {
- "load_time_ranks": {
- "percentile_ranks": {
- "field": "load_time", <1>
- "values": [ 500, 600 ]
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:latency]
- <1> The field `load_time` must be a numeric field
- The response will look like this:
- [source,console-result]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "load_time_ranks": {
- "values": {
- "500.0": 90.01,
- "600.0": 100.0
- }
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- // TESTRESPONSE[s/"500.0": 90.01/"500.0": 55.00000000000001/]
- // TESTRESPONSE[s/"600.0": 100.0/"600.0": 64.0/]
- From this information you can determine you are hitting the 99% load time target but not quite
- hitting the 95% load time target
- ==== Keyed Response
- By default the `keyed` flag is set to `true` associates a unique string key with each bucket and returns the ranges as a hash rather than an array. Setting the `keyed` flag to `false` will disable this behavior:
- [source,console]
- --------------------------------------------------
- GET latency/_search
- {
- "size": 0,
- "aggs": {
- "load_time_ranks": {
- "percentile_ranks": {
- "field": "load_time",
- "values": [ 500, 600 ],
- "keyed": false
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:latency]
- Response:
- [source,console-result]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "load_time_ranks": {
- "values": [
- {
- "key": 500.0,
- "value": 90.01
- },
- {
- "key": 600.0,
- "value": 100.0
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- // TESTRESPONSE[s/"value": 90.01/"value": 55.00000000000001/]
- // TESTRESPONSE[s/"value": 100.0/"value": 64.0/]
- ==== Script
- If you need to run the aggregation against values that aren't indexed, use
- a <<runtime,runtime field>>. For example, if our load times
- are in milliseconds but we want percentiles calculated in seconds:
- [source,console]
- ----
- GET latency/_search
- {
- "size": 0,
- "runtime_mappings": {
- "load_time.seconds": {
- "type": "long",
- "script": {
- "source": "emit(doc['load_time'].value / params.timeUnit)",
- "params": {
- "timeUnit": 1000
- }
- }
- }
- },
- "aggs": {
- "load_time_ranks": {
- "percentile_ranks": {
- "values": [ 500, 600 ],
- "field": "load_time.seconds"
- }
- }
- }
- }
- ----
- // TEST[setup:latency]
- // TEST[s/_search/_search?filter_path=aggregations/]
- ////
- [source,console-result]
- --------------------------------------------------
- {
- "aggregations": {
- "load_time_ranks": {
- "values": {
- "500.0": 100.0,
- "600.0": 100.0
- }
- }
- }
- }
- --------------------------------------------------
- ////
- ==== HDR Histogram
- NOTE: This setting exposes the internal implementation of HDR Histogram and the syntax may change in the future.
- https://github.com/HdrHistogram/HdrHistogram[HDR Histogram] (High Dynamic Range Histogram) is an alternative implementation
- that can be useful when calculating percentile ranks for latency measurements as it can be faster than the t-digest implementation
- with the trade-off of a larger memory footprint. This implementation maintains a fixed worse-case percentage error (specified as a
- number of significant digits). This means that if data is recorded with values from 1 microsecond up to 1 hour (3,600,000,000
- microseconds) in a histogram set to 3 significant digits, it will maintain a value resolution of 1 microsecond for values up to
- 1 millisecond and 3.6 seconds (or better) for the maximum tracked value (1 hour).
- The HDR Histogram can be used by specifying the `hdr` object in the request:
- [source,console]
- --------------------------------------------------
- GET latency/_search
- {
- "size": 0,
- "aggs": {
- "load_time_ranks": {
- "percentile_ranks": {
- "field": "load_time",
- "values": [ 500, 600 ],
- "hdr": { <1>
- "number_of_significant_value_digits": 3 <2>
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:latency]
- <1> `hdr` object indicates that HDR Histogram should be used to calculate the percentiles and specific settings for this algorithm can be specified inside the object
- <2> `number_of_significant_value_digits` specifies the resolution of values for the histogram in number of significant digits
- The HDRHistogram only supports positive values and will error if it is passed a negative value. It is also not a good idea to use
- the HDRHistogram if the range of values is unknown as this could lead to high memory usage.
- ==== Missing value
- The `missing` parameter defines how documents that are missing a value should be treated.
- By default they will be ignored but it is also possible to treat them as if they
- had a value.
- [source,console]
- --------------------------------------------------
- GET latency/_search
- {
- "size": 0,
- "aggs": {
- "load_time_ranks": {
- "percentile_ranks": {
- "field": "load_time",
- "values": [ 500, 600 ],
- "missing": 10 <1>
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:latency]
- <1> Documents without a value in the `load_time` field will fall into the same bucket as documents that have the value `10`.
|