123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173 |
- [[search-aggregations-metrics-extendedstats-aggregation]]
- === Extended Stats Aggregation
- A `multi-value` metrics aggregation that computes stats over numeric values extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.
- The `extended_stats` aggregations is an extended version of the <<search-aggregations-metrics-stats-aggregation,`stats`>> aggregation, where additional metrics are added such as `sum_of_squares`, `variance`, `std_deviation` and `std_deviation_bounds`.
- Assuming the data consists of documents representing exams grades (between 0 and 100) of students
- [source,js]
- --------------------------------------------------
- {
- "aggs" : {
- "grades_stats" : { "extended_stats" : { "field" : "grade" } }
- }
- }
- --------------------------------------------------
- The above aggregation computes the grades statistics over all documents. The aggregation type is `extended_stats` and the `field` setting defines the numeric field of the documents the stats will be computed on. The above will return the following:
- [source,js]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "grade_stats": {
- "count": 9,
- "min": 72,
- "max": 99,
- "avg": 86,
- "sum": 774,
- "sum_of_squares": 67028,
- "variance": 51.55555555555556,
- "std_deviation": 7.180219742846005,
- "std_deviation_bounds": {
- "upper": 100.36043948569201,
- "lower": 71.63956051430799
- }
- }
- }
- }
- --------------------------------------------------
- The name of the aggregation (`grades_stats` above) also serves as the key by which the aggregation result can be retrieved from the returned response.
- ==== Standard Deviation Bounds
- By default, the `extended_stats` metric will return an object called `std_deviation_bounds`, which provides an interval of plus/minus two standard
- deviations from the mean. This can be a useful way to visualize variance of your data. If you want a different boundary, for example
- three standard deviations, you can set `sigma` in the request:
- [source,js]
- --------------------------------------------------
- {
- "aggs" : {
- "grades_stats" : {
- "extended_stats" : {
- "field" : "grade",
- "sigma" : 3 <1>
- }
- }
- }
- }
- --------------------------------------------------
- <1> `sigma` controls how many standard deviations +/- from the mean should be displayed
- `sigma` can be any non-negative double, meaning you can request non-integer values such as `1.5`. A value of `0` is valid, but will simply
- return the average for both `upper` and `lower` bounds.
- .Standard Deviation and Bounds require normality
- [NOTE]
- =====
- The standard deviation and its bounds are displayed by default, but they are not always applicable to all data-sets. Your data must
- be normally distributed for the metrics to make sense. The statistics behind standard deviations assumes normally distributed data, so
- if your data is skewed heavily left or right, the value returned will be misleading.
- =====
- ==== Script
- Computing the grades stats based on a script:
- [source,js]
- --------------------------------------------------
- {
- ...,
- "aggs" : {
- "grades_stats" : {
- "extended_stats" : {
- "script" : {
- "inline" : "doc['grade'].value",
- "lang" : "painless"
- }
- }
- }
- }
- }
- --------------------------------------------------
- This will interpret the `script` parameter as an `inline` script with the `painless` script language and no script parameters. To use a file script use the following syntax:
- [source,js]
- --------------------------------------------------
- {
- ...,
- "aggs" : {
- "grades_stats" : {
- "extended_stats" : {
- "script" : {
- "file": "my_script",
- "params": {
- "field": "grade"
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- TIP: for indexed scripts replace the `file` parameter with an `id` parameter.
- ===== Value Script
- It turned out that the exam was way above the level of the students and a grade correction needs to be applied. We can use value script to get the new stats:
- [source,js]
- --------------------------------------------------
- {
- "aggs" : {
- ...
- "aggs" : {
- "grades_stats" : {
- "extended_stats" : {
- "field" : "grade",
- "script" : {
- "lang" : "painless",
- "inline": "_value * params.correction",
- "params" : {
- "correction" : 1.2
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- ==== Missing value
- The `missing` parameter defines how documents that are missing a value should be treated.
- By default they will be ignored but it is also possible to treat them as if they
- had a value.
- [source,js]
- --------------------------------------------------
- {
- "aggs" : {
- "grades_stats" : {
- "extended_stats" : {
- "field" : "grade",
- "missing": 0 <1>
- }
- }
- }
- }
- --------------------------------------------------
- <1> Documents without a value in the `grade` field will fall into the same bucket as documents that have the value `0`.
|