median-absolute-deviation-aggregation.asciidoc 5.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189
  1. [[search-aggregations-metrics-median-absolute-deviation-aggregation]]
  2. === Median Absolute Deviation Aggregation
  3. This `single-value` aggregation approximates the https://en.wikipedia.org/wiki/Median_absolute_deviation[median absolute deviation]
  4. of its search results.
  5. Median absolute deviation is a measure of variability. It is a robust
  6. statistic, meaning that it is useful for describing data that may have
  7. outliers, or may not be normally distributed. For such data it can be more
  8. descriptive than standard deviation.
  9. It is calculated as the median of each data point's deviation from the median
  10. of the entire sample. That is, for a random variable X, the median absolute
  11. deviation is median(|median(X) - X~i~|).
  12. ==== Example
  13. Assume our data represents product reviews on a one to five star scale.
  14. Such reviews are usually summarized as a mean, which is easily understandable
  15. but doesn't describe the reviews' variability. Estimating the median absolute
  16. deviation can provide insight into how much reviews vary from one another.
  17. In this example we have a product which has an average rating of
  18. 3 stars. Let's look at its ratings' median absolute deviation to determine
  19. how much they vary
  20. [source,js]
  21. ---------------------------------------------------------
  22. GET reviews/_search
  23. {
  24. "size": 0,
  25. "aggs": {
  26. "review_average": {
  27. "avg": {
  28. "field": "rating"
  29. }
  30. },
  31. "review_variability": {
  32. "median_absolute_deviation": {
  33. "field": "rating" <1>
  34. }
  35. }
  36. }
  37. }
  38. ---------------------------------------------------------
  39. // CONSOLE
  40. // TEST[setup:reviews]
  41. <1> `rating` must be a numeric field
  42. The resulting median absolute deviation of `2` tells us that there is a fair
  43. amount of variability in the ratings. Reviewers must have diverse opinions about
  44. this product.
  45. [source,js]
  46. ---------------------------------------------------------
  47. {
  48. ...
  49. "aggregations": {
  50. "review_average": {
  51. "value": 3.0
  52. },
  53. "review_variability": {
  54. "value": 2.0
  55. }
  56. }
  57. }
  58. ---------------------------------------------------------
  59. // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
  60. ==== Approximation
  61. The naive implementation of calculating median absolute deviation stores the
  62. entire sample in memory, so this aggregation instead calculates an
  63. approximation. It uses the https://github.com/tdunning/t-digest[TDigest data structure]
  64. to approximate the sample median and the median of deviations from the sample
  65. median. For more about the approximation characteristics of TDigests, see
  66. <<search-aggregations-metrics-percentile-aggregation-approximation>>.
  67. The tradeoff between resource usage and accuracy of a TDigest's quantile
  68. approximation, and therefore the accuracy of this aggregation's approximation
  69. of median absolute deviation, is controlled by the `compression` parameter. A
  70. higher `compression` setting provides a more accurate approximation at the
  71. cost of higher memory usage. For more about the characteristics of the TDigest
  72. `compression` parameter see
  73. <<search-aggregations-metrics-percentile-aggregation-compression>>.
  74. [source,js]
  75. ---------------------------------------------------------
  76. GET reviews/_search
  77. {
  78. "size": 0,
  79. "aggs": {
  80. "review_variability": {
  81. "median_absolute_deviation": {
  82. "field": "rating",
  83. "compression": 100
  84. }
  85. }
  86. }
  87. }
  88. ---------------------------------------------------------
  89. // CONSOLE
  90. // TEST[setup:reviews]
  91. The default `compression` value for this aggregation is `1000`. At this
  92. compression level this aggregation is usually within 5% of the exact result,
  93. but observed performance will depend on the sample data.
  94. ==== Script
  95. This metric aggregation supports scripting. In our example above, product
  96. reviews are on a scale of one to five. If we wanted to modify them to a scale
  97. of one to ten, we can using scripting.
  98. To provide an inline script:
  99. [source,js]
  100. ---------------------------------------------------------
  101. GET reviews/_search
  102. {
  103. "size": 0,
  104. "aggs": {
  105. "review_variability": {
  106. "median_absolute_deviation": {
  107. "script": {
  108. "lang": "painless",
  109. "source": "doc['rating'].value * params.scaleFactor",
  110. "params": {
  111. "scaleFactor": 2
  112. }
  113. }
  114. }
  115. }
  116. }
  117. }
  118. ---------------------------------------------------------
  119. // CONSOLE
  120. // TEST[setup:reviews]
  121. To provide a stored script:
  122. [source,js]
  123. ---------------------------------------------------------
  124. GET reviews/_search
  125. {
  126. "size": 0,
  127. "aggs": {
  128. "review_variability": {
  129. "median_absolute_deviation": {
  130. "script": {
  131. "id": "my_script",
  132. "params": {
  133. "field": "rating"
  134. }
  135. }
  136. }
  137. }
  138. }
  139. }
  140. ---------------------------------------------------------
  141. // CONSOLE
  142. // TEST[setup:reviews,stored_example_script]
  143. ==== Missing value
  144. The `missing` parameter defines how documents that are missing a value should be
  145. treated. By default they will be ignored but it is also possible to treat them
  146. as if they had a value.
  147. Let's be optimistic and assume some reviewers loved the product so much that
  148. they forgot to give it a rating. We'll assign them five stars
  149. [source,js]
  150. ---------------------------------------------------------
  151. GET reviews/_search
  152. {
  153. "size": 0,
  154. "aggs": {
  155. "review_variability": {
  156. "median_absolute_deviation": {
  157. "field": "rating",
  158. "missing": 5
  159. }
  160. }
  161. }
  162. }
  163. ---------------------------------------------------------
  164. // CONSOLE
  165. // TEST[setup:reviews]