stats-aggregation.asciidoc 4.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115
  1. [[search-aggregations-matrix-stats-aggregation]]
  2. === Matrix Stats
  3. The `matrix_stats` aggregation is a numeric aggregation that computes the following statistics over a set of document fields:
  4. [horizontal]
  5. `count`:: Number of per field samples included in the calculation.
  6. `mean`:: The average value for each field.
  7. `variance`:: Per field Measurement for how spread out the samples are from the mean.
  8. `skewness`:: Per field measurement quantifying the asymmetric distribution around the mean.
  9. `kurtosis`:: Per field measurement quantifying the shape of the distribution.
  10. `covariance`:: A matrix that quantitatively describes how changes in one field are associated with another.
  11. `correlation`:: The covariance matrix scaled to a range of -1 to 1, inclusive. Describes the relationship between field
  12. distributions.
  13. The following example demonstrates the use of matrix stats to describe the relationship between income and poverty.
  14. [source,js]
  15. --------------------------------------------------
  16. {
  17. "aggs": {
  18. "matrixstats": {
  19. "matrix_stats": {
  20. "fields": ["poverty", "income"]
  21. }
  22. }
  23. }
  24. }
  25. --------------------------------------------------
  26. The aggregation type is `matrix_stats` and the `fields` setting defines the set of fields (as an array) for computing
  27. the statistics. The above request returns the following response:
  28. [source,js]
  29. --------------------------------------------------
  30. {
  31. ...
  32. "aggregations": {
  33. "matrixstats": {
  34. "doc_count": 50,
  35. "fields": [{
  36. "name": "income",
  37. "count": 50,
  38. "mean": 51985.1,
  39. "variance": 7.383377037755103E7,
  40. "skewness": 0.5595114003506483,
  41. "kurtosis": 2.5692365287787124,
  42. "covariance": {
  43. "income": 7.383377037755103E7,
  44. "poverty": -21093.65836734694
  45. },
  46. "correlation": {
  47. "income": 1.0,
  48. "poverty": -0.8352655256272504
  49. }
  50. }, {
  51. "name": "poverty",
  52. "count": 50,
  53. "mean": 12.732000000000001,
  54. "variance": 8.637730612244896,
  55. "skewness": 0.4516049811903419,
  56. "kurtosis": 2.8615929677997767,
  57. "covariance": {
  58. "income": -21093.65836734694,
  59. "poverty": 8.637730612244896
  60. },
  61. "correlation": {
  62. "income": -0.8352655256272504,
  63. "poverty": 1.0
  64. }
  65. }]
  66. }
  67. }
  68. }
  69. --------------------------------------------------
  70. The `doc_count` field indicates the number of documents involved in the computation of the statistics.
  71. ==== Multi Value Fields
  72. The `matrix_stats` aggregation treats each document field as an independent sample. The `mode` parameter controls what
  73. array value the aggregation will use for array or multi-valued fields. This parameter can take one of the following:
  74. [horizontal]
  75. `avg`:: (default) Use the average of all values.
  76. `min`:: Pick the lowest value.
  77. `max`:: Pick the highest value.
  78. `sum`:: Use the sum of all values.
  79. `median`:: Use the median of all values.
  80. ==== Missing Values
  81. The `missing` parameter defines how documents that are missing a value should be treated.
  82. By default they will be ignored but it is also possible to treat them as if they had a value.
  83. This is done by adding a set of fieldname : value mappings to specify default values per field.
  84. [source,js]
  85. --------------------------------------------------
  86. {
  87. "aggs": {
  88. "matrixstats": {
  89. "matrix_stats": {
  90. "fields": ["poverty", "income"],
  91. "missing": {"income" : 50000} <1>
  92. }
  93. }
  94. }
  95. }
  96. --------------------------------------------------
  97. <1> Documents without a value in the `income` field will have the default value `50000`.
  98. ==== Script
  99. This aggregation family does not yet support scripting.