extendedstats-aggregation.asciidoc 5.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188
  1. [[search-aggregations-metrics-extendedstats-aggregation]]
  2. === Extended stats aggregation
  3. ++++
  4. <titleabbrev>Extended stats</titleabbrev>
  5. ++++
  6. A `multi-value` metrics aggregation that computes stats over numeric values extracted from the aggregated documents.
  7. The `extended_stats` aggregations is an extended version of the <<search-aggregations-metrics-stats-aggregation,`stats`>> aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`.
  8. Assuming the data consists of documents representing exams grades (between 0 and 100) of students
  9. [source,console]
  10. --------------------------------------------------
  11. GET /exams/_search
  12. {
  13. "size": 0,
  14. "aggs": {
  15. "grades_stats": { "extended_stats": { "field": "grade" } }
  16. }
  17. }
  18. --------------------------------------------------
  19. // TEST[setup:exams]
  20. The above aggregation computes the grades statistics over all documents. The aggregation type is `extended_stats` and the `field` setting defines the numeric field of the documents the stats will be computed on. The above will return the following:
  21. The `std_deviation` and `variance` are calculated as population metrics so they are always the same as `std_deviation_population` and `variance_population` respectively.
  22. [source,console-result]
  23. --------------------------------------------------
  24. {
  25. ...
  26. "aggregations": {
  27. "grades_stats": {
  28. "count": 2,
  29. "min": 50.0,
  30. "max": 100.0,
  31. "avg": 75.0,
  32. "sum": 150.0,
  33. "sum_of_squares": 12500.0,
  34. "variance": 625.0,
  35. "variance_population": 625.0,
  36. "variance_sampling": 1250.0,
  37. "std_deviation": 25.0,
  38. "std_deviation_population": 25.0,
  39. "std_deviation_sampling": 35.35533905932738,
  40. "std_deviation_bounds": {
  41. "upper": 125.0,
  42. "lower": 25.0,
  43. "upper_population": 125.0,
  44. "lower_population": 25.0,
  45. "upper_sampling": 145.71067811865476,
  46. "lower_sampling": 4.289321881345245
  47. }
  48. }
  49. }
  50. }
  51. --------------------------------------------------
  52. // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
  53. The name of the aggregation (`grades_stats` above) also serves as the key by which the aggregation result can be retrieved from the returned response.
  54. ==== Standard Deviation Bounds
  55. By default, the `extended_stats` metric will return an object called `std_deviation_bounds`, which provides an interval of plus/minus two standard
  56. deviations from the mean. This can be a useful way to visualize variance of your data. If you want a different boundary, for example
  57. three standard deviations, you can set `sigma` in the request:
  58. [source,console]
  59. --------------------------------------------------
  60. GET /exams/_search
  61. {
  62. "size": 0,
  63. "aggs": {
  64. "grades_stats": {
  65. "extended_stats": {
  66. "field": "grade",
  67. "sigma": 3 <1>
  68. }
  69. }
  70. }
  71. }
  72. --------------------------------------------------
  73. // TEST[setup:exams]
  74. <1> `sigma` controls how many standard deviations +/- from the mean should be displayed
  75. `sigma` can be any non-negative double, meaning you can request non-integer values such as `1.5`. A value of `0` is valid, but will simply
  76. return the average for both `upper` and `lower` bounds.
  77. The `upper` and `lower` bounds are calculated as population metrics so they are always the same as `upper_population` and
  78. `lower_population` respectively.
  79. .Standard Deviation and Bounds require normality
  80. [NOTE]
  81. =====
  82. The standard deviation and its bounds are displayed by default, but they are not always applicable to all data-sets. Your data must
  83. be normally distributed for the metrics to make sense. The statistics behind standard deviations assumes normally distributed data, so
  84. if your data is skewed heavily left or right, the value returned will be misleading.
  85. =====
  86. ==== Script
  87. If you need to aggregate on a value that isn't indexed, use a <<runtime,runtime field>>.
  88. Say the we found out that the grades we've been working on were for an exam that was above
  89. the level of the students and we want to "correct" it:
  90. [source,console]
  91. ----
  92. GET /exams/_search
  93. {
  94. "size": 0,
  95. "runtime_mappings": {
  96. "grade.corrected": {
  97. "type": "double",
  98. "script": {
  99. "source": "emit(Math.min(100, doc['grade'].value * params.correction))",
  100. "params": {
  101. "correction": 1.2
  102. }
  103. }
  104. }
  105. },
  106. "aggs": {
  107. "grades_stats": {
  108. "extended_stats": { "field": "grade.corrected" }
  109. }
  110. }
  111. }
  112. ----
  113. // TEST[setup:exams]
  114. // TEST[s/_search/_search?filter_path=aggregations/]
  115. ////
  116. [source,console-result]
  117. ----
  118. {
  119. "aggregations": {
  120. "grades_stats": {
  121. "count": 2,
  122. "min": 60.0,
  123. "max": 100.0,
  124. "avg": 80.0,
  125. "sum": 160.0,
  126. "sum_of_squares": 13600.0,
  127. "variance": 400.0,
  128. "variance_population": 400.0,
  129. "variance_sampling": 800.0,
  130. "std_deviation": 20.0,
  131. "std_deviation_population": 20.0,
  132. "std_deviation_sampling": 28.284271247461902,
  133. "std_deviation_bounds": {
  134. "upper": 120.0,
  135. "lower": 40.0,
  136. "upper_population": 120.0,
  137. "lower_population": 40.0,
  138. "upper_sampling": 136.5685424949238,
  139. "lower_sampling": 23.431457505076196
  140. }
  141. }
  142. }
  143. }
  144. ----
  145. ////
  146. ==== Missing value
  147. The `missing` parameter defines how documents that are missing a value should be treated.
  148. By default they will be ignored but it is also possible to treat them as if they
  149. had a value.
  150. [source,console]
  151. --------------------------------------------------
  152. GET /exams/_search
  153. {
  154. "size": 0,
  155. "aggs": {
  156. "grades_stats": {
  157. "extended_stats": {
  158. "field": "grade",
  159. "missing": 0 <1>
  160. }
  161. }
  162. }
  163. }
  164. --------------------------------------------------
  165. // TEST[setup:exams]
  166. <1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `0`.