123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186 |
- [role="xpack"]
- [testenv="basic"]
- [[search-aggregations-metrics-boxplot-aggregation]]
- === Boxplot Aggregation
- A `boxplot` metrics aggregation that computes boxplot of numeric values extracted from the aggregated documents.
- These values can be generated by a provided script or extracted from specific numeric or
- <<histogram,histogram fields>> in the documents.
- The `boxplot` aggregation returns essential information for making a https://en.wikipedia.org/wiki/Box_plot[box plot]: minimum, maximum,
- median, first quartile (25th percentile) and third quartile (75th percentile) values.
- ==== Syntax
- A `boxplot` aggregation looks like this in isolation:
- [source,js]
- --------------------------------------------------
- {
- "boxplot": {
- "field": "load_time"
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- Let's look at a boxplot representing load time:
- [source,console]
- --------------------------------------------------
- GET latency/_search
- {
- "size": 0,
- "aggs" : {
- "load_time_boxplot" : {
- "boxplot" : {
- "field" : "load_time" <1>
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:latency]
- <1> The field `load_time` must be a numeric field
- The response will look like this:
- [source,console-result]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "load_time_boxplot": {
- "min": 0.0,
- "max": 990.0,
- "q1": 165.0,
- "q2": 445.0,
- "q3": 725.0
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- ==== Script
- The boxplot metric supports scripting. For example, if our load times
- are in milliseconds but we want values calculated in seconds, we could use
- a script to convert them on-the-fly:
- [source,console]
- --------------------------------------------------
- GET latency/_search
- {
- "size": 0,
- "aggs" : {
- "load_time_boxplot" : {
- "boxplot" : {
- "script" : {
- "lang": "painless",
- "source": "doc['load_time'].value / params.timeUnit", <1>
- "params" : {
- "timeUnit" : 1000 <2>
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:latency]
- <1> The `field` parameter is replaced with a `script` parameter, which uses the
- script to generate values which percentiles are calculated on
- <2> Scripting supports parameterized input just like any other script
- This will interpret the `script` parameter as an `inline` script with the `painless` script language and no script parameters. To use a
- stored script use the following syntax:
- [source,console]
- --------------------------------------------------
- GET latency/_search
- {
- "size": 0,
- "aggs" : {
- "load_time_boxplot" : {
- "boxplot" : {
- "script" : {
- "id": "my_script",
- "params": {
- "field": "load_time"
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:latency,stored_example_script]
- [[search-aggregations-metrics-boxplot-aggregation-approximation]]
- ==== Boxplot values are (usually) approximate
- The algorithm used by the `boxplot` metric is called TDigest (introduced by
- Ted Dunning in
- https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf[Computing Accurate Quantiles using T-Digests]).
- [WARNING]
- ====
- Boxplot as other percentile aggregations are also
- https://en.wikipedia.org/wiki/Nondeterministic_algorithm[non-deterministic].
- This means you can get slightly different results using the same data.
- ====
- [[search-aggregations-metrics-boxplot-aggregation-compression]]
- ==== Compression
- Approximate algorithms must balance memory utilization with estimation accuracy.
- This balance can be controlled using a `compression` parameter:
- [source,console]
- --------------------------------------------------
- GET latency/_search
- {
- "size": 0,
- "aggs" : {
- "load_time_boxplot" : {
- "boxplot" : {
- "field" : "load_time",
- "compression" : 200 <1>
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:latency]
- <1> Compression controls memory usage and approximation error
- include::percentile-aggregation.asciidoc[tags=t-digest]
- ==== Missing value
- The `missing` parameter defines how documents that are missing a value should be treated.
- By default they will be ignored but it is also possible to treat them as if they
- had a value.
- [source,console]
- --------------------------------------------------
- GET latency/_search
- {
- "size": 0,
- "aggs" : {
- "grade_boxplot" : {
- "boxplot" : {
- "field" : "grade",
- "missing": 10 <1>
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:latency]
- <1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `10`.
|