matrix-stats-aggregation.asciidoc 4.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
  1. [[search-aggregations-matrix-stats-aggregation]]
  2. === Matrix stats aggregation
  3. ++++
  4. <titleabbrev>Matrix stats</titleabbrev>
  5. ++++
  6. The `matrix_stats` aggregation is a numeric aggregation that computes the following statistics over a set of document fields:
  7. [horizontal]
  8. `count`:: Number of per field samples included in the calculation.
  9. `mean`:: The average value for each field.
  10. `variance`:: Per field Measurement for how spread out the samples are from the mean.
  11. `skewness`:: Per field measurement quantifying the asymmetric distribution around the mean.
  12. `kurtosis`:: Per field measurement quantifying the shape of the distribution.
  13. `covariance`:: A matrix that quantitatively describes how changes in one field are associated with another.
  14. `correlation`:: The covariance matrix scaled to a range of -1 to 1, inclusive. Describes the relationship between field
  15. distributions.
  16. IMPORTANT: Unlike other metric aggregations, the `matrix_stats` aggregation does
  17. not support scripting.
  18. //////////////////////////
  19. [source,js]
  20. --------------------------------------------------
  21. PUT /statistics/_doc/0
  22. {"poverty": 24.0, "income": 50000.0}
  23. PUT /statistics/_doc/1
  24. {"poverty": 13.0, "income": 95687.0}
  25. PUT /statistics/_doc/2
  26. {"poverty": 69.0, "income": 7890.0}
  27. POST /_refresh
  28. --------------------------------------------------
  29. // NOTCONSOLE
  30. // TESTSETUP
  31. //////////////////////////
  32. The following example demonstrates the use of matrix stats to describe the relationship between income and poverty.
  33. [source,console,id=stats-aggregation-example]
  34. --------------------------------------------------
  35. GET /_search
  36. {
  37. "aggs": {
  38. "statistics": {
  39. "matrix_stats": {
  40. "fields": [ "poverty", "income" ]
  41. }
  42. }
  43. }
  44. }
  45. --------------------------------------------------
  46. // TEST[s/_search/_search\?filter_path=aggregations/]
  47. The aggregation type is `matrix_stats` and the `fields` setting defines the set of fields (as an array) for computing
  48. the statistics. The above request returns the following response:
  49. [source,console-result]
  50. --------------------------------------------------
  51. {
  52. ...
  53. "aggregations": {
  54. "statistics": {
  55. "doc_count": 50,
  56. "fields": [ {
  57. "name": "income",
  58. "count": 50,
  59. "mean": 51985.1,
  60. "variance": 7.383377037755103E7,
  61. "skewness": 0.5595114003506483,
  62. "kurtosis": 2.5692365287787124,
  63. "covariance": {
  64. "income": 7.383377037755103E7,
  65. "poverty": -21093.65836734694
  66. },
  67. "correlation": {
  68. "income": 1.0,
  69. "poverty": -0.8352655256272504
  70. }
  71. }, {
  72. "name": "poverty",
  73. "count": 50,
  74. "mean": 12.732000000000001,
  75. "variance": 8.637730612244896,
  76. "skewness": 0.4516049811903419,
  77. "kurtosis": 2.8615929677997767,
  78. "covariance": {
  79. "income": -21093.65836734694,
  80. "poverty": 8.637730612244896
  81. },
  82. "correlation": {
  83. "income": -0.8352655256272504,
  84. "poverty": 1.0
  85. }
  86. } ]
  87. }
  88. }
  89. }
  90. --------------------------------------------------
  91. // TESTRESPONSE[s/\.\.\.//]
  92. // TESTRESPONSE[s/: (\-)?[0-9\.E]+/: $body.$_path/]
  93. The `doc_count` field indicates the number of documents involved in the computation of the statistics.
  94. ==== Multi Value Fields
  95. The `matrix_stats` aggregation treats each document field as an independent sample. The `mode` parameter controls what
  96. array value the aggregation will use for array or multi-valued fields. This parameter can take one of the following:
  97. [horizontal]
  98. `avg`:: (default) Use the average of all values.
  99. `min`:: Pick the lowest value.
  100. `max`:: Pick the highest value.
  101. `sum`:: Use the sum of all values.
  102. `median`:: Use the median of all values.
  103. ==== Missing Values
  104. The `missing` parameter defines how documents that are missing a value should be treated.
  105. By default they will be ignored but it is also possible to treat them as if they had a value.
  106. This is done by adding a set of fieldname : value mappings to specify default values per field.
  107. [source,console,id=stats-aggregation-missing-example]
  108. --------------------------------------------------
  109. GET /_search
  110. {
  111. "aggs": {
  112. "matrixstats": {
  113. "matrix_stats": {
  114. "fields": [ "poverty", "income" ],
  115. "missing": { "income": 50000 } <1>
  116. }
  117. }
  118. }
  119. }
  120. --------------------------------------------------
  121. <1> Documents without a value in the `income` field will have the default value `50000`.