| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187 | [[search-aggregations-metrics-median-absolute-deviation-aggregation]]=== Median absolute deviation aggregation++++<titleabbrev>Median absolute deviation</titleabbrev>++++This `single-value` aggregation approximates the {wikipedia}/Median_absolute_deviation[median absolute deviation]of its search results.Median absolute deviation is a measure of variability. It is a robuststatistic, meaning that it is useful for describing data that may haveoutliers, or may not be normally distributed. For such data it can be moredescriptive than standard deviation.It is calculated as the median of each data point's deviation from the medianof the entire sample. That is, for a random variable X, the median absolutedeviation is median(|median(X) - X~i~|).==== ExampleAssume our data represents product reviews on a one to five star scale.Such reviews are usually summarized as a mean, which is easily understandablebut doesn't describe the reviews' variability. Estimating the median absolutedeviation can provide insight into how much reviews vary from one another.In this example we have a product which has an average rating of3 stars. Let's look at its ratings' median absolute deviation to determinehow much they vary[source,console]---------------------------------------------------------GET reviews/_search{  "size": 0,  "aggs": {    "review_average": {      "avg": {        "field": "rating"      }    },    "review_variability": {      "median_absolute_deviation": {        "field": "rating" <1>      }    }  }}---------------------------------------------------------// TEST[setup:reviews]<1> `rating` must be a numeric fieldThe resulting median absolute deviation of `2` tells us that there is a fairamount of variability in the ratings. Reviewers must have diverse opinions aboutthis product.[source,console-result]---------------------------------------------------------{  ...  "aggregations": {    "review_average": {      "value": 3.0    },    "review_variability": {      "value": 2.0    }  }}---------------------------------------------------------// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]==== ApproximationThe naive implementation of calculating median absolute deviation stores theentire sample in memory, so this aggregation instead calculates anapproximation. It uses the https://github.com/tdunning/t-digest[TDigest data structure]to approximate the sample median and the median of deviations from the samplemedian. For more about the approximation characteristics of TDigests, see<<search-aggregations-metrics-percentile-aggregation-approximation>>.The tradeoff between resource usage and accuracy of a TDigest's quantileapproximation, and therefore the accuracy of this aggregation's approximationof median absolute deviation, is controlled by the `compression` parameter. Ahigher `compression` setting provides a more accurate approximation at thecost of higher memory usage. For more about the characteristics of the TDigest`compression` parameter see<<search-aggregations-metrics-percentile-aggregation-compression>>.[source,console]---------------------------------------------------------GET reviews/_search{  "size": 0,  "aggs": {    "review_variability": {      "median_absolute_deviation": {        "field": "rating",        "compression": 100      }    }  }}---------------------------------------------------------// TEST[setup:reviews]The default `compression` value for this aggregation is `1000`. At thiscompression level this aggregation is usually within 5% of the exact result,but observed performance will depend on the sample data.==== ScriptThis metric aggregation supports scripting. In our example above, productreviews are on a scale of one to five. If we wanted to modify them to a scaleof one to ten, we can using scripting.To provide an inline script:[source,console]---------------------------------------------------------GET reviews/_search{  "size": 0,  "aggs": {    "review_variability": {      "median_absolute_deviation": {        "script": {          "lang": "painless",          "source": "doc['rating'].value * params.scaleFactor",          "params": {            "scaleFactor": 2          }        }      }    }  }}---------------------------------------------------------// TEST[setup:reviews]To provide a stored script:[source,console]---------------------------------------------------------GET reviews/_search{  "size": 0,  "aggs": {    "review_variability": {      "median_absolute_deviation": {        "script": {          "id": "my_script",          "params": {            "field": "rating"          }        }      }    }  }}---------------------------------------------------------// TEST[setup:reviews,stored_example_script]==== Missing valueThe `missing` parameter defines how documents that are missing a value should betreated. By default they will be ignored but it is also possible to treat themas if they had a value.Let's be optimistic and assume some reviewers loved the product so much thatthey forgot to give it a rating. We'll assign them five stars[source,console]---------------------------------------------------------GET reviews/_search{  "size": 0,  "aggs": {    "review_variability": {      "median_absolute_deviation": {        "field": "rating",        "missing": 5      }    }  }}---------------------------------------------------------// TEST[setup:reviews]
 |