normalize-aggregation.asciidoc 5.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[search-aggregations-pipeline-normalize-aggregation]]
  4. === Normalize aggregation
  5. ++++
  6. <titleabbrev>Normalize</titleabbrev>
  7. ++++
  8. A parent pipeline aggregation which calculates the specific normalized/rescaled value for a specific bucket value.
  9. Values that cannot be normalized, will be skipped using the <<gap-policy, skip gap policy>>.
  10. ==== Syntax
  11. A `normalize` aggregation looks like this in isolation:
  12. [source,js]
  13. --------------------------------------------------
  14. {
  15. "normalize": {
  16. "buckets_path": "normalized",
  17. "method": "percent_of_sum"
  18. }
  19. }
  20. --------------------------------------------------
  21. // NOTCONSOLE
  22. [[normalize_pipeline-params]]
  23. .`normalize_pipeline` Parameters
  24. [options="header"]
  25. |===
  26. |Parameter Name |Description |Required |Default Value
  27. |`buckets_path` |The path to the buckets we wish to normalize (see <<buckets-path-syntax, `buckets_path` syntax>> for more details) |Required |
  28. |`method` | The specific <<normalize_pipeline-method, method>> to apply | Required |
  29. |`format` |format to apply to the output value of this aggregation |Optional |`null`
  30. |===
  31. ==== Methods
  32. [[normalize_pipeline-method]]
  33. The Normalize Aggregation supports multiple methods to transform the bucket values. Each method definition will use
  34. the following original set of bucket values as examples: `[5, 5, 10, 50, 10, 20]`.
  35. _rescale_0_1_::
  36. This method rescales the data such that the minimum number is zero, and the maximum number is 1, with the rest normalized
  37. linearly in-between.
  38. x' = (x - min_x) / (max_x - min_x)
  39. [0, 0, .1111, 1, .1111, .3333]
  40. _rescale_0_100_::
  41. This method rescales the data such that the minimum number is zero, and the maximum number is 100, with the rest normalized
  42. linearly in-between.
  43. x' = 100 * (x - min_x) / (max_x - min_x)
  44. [0, 0, 11.11, 100, 11.11, 33.33]
  45. _percent_of_sum_::
  46. This method normalizes each value so that it represents a percentage of the total sum it attributes to.
  47. x' = x / sum_x
  48. [5%, 5%, 10%, 50%, 10%, 20%]
  49. _mean_::
  50. This method normalizes such that each value is normalized by how much it differs from the average.
  51. x' = (x - mean_x) / (max_x - min_x)
  52. [4.63, 4.63, 9.63, 49.63, 9.63, 9.63, 19.63]
  53. _zscore_::
  54. This method normalizes such that each value represents how far it is from the mean relative to the standard deviation
  55. x' = (x - mean_x) / stdev_x
  56. [-0.68, -0.68, -0.39, 1.94, -0.39, 0.19]
  57. _softmax_::
  58. This method normalizes such that each value is exponentiated and relative to the sum of the exponents of the original values.
  59. x' = e^x / sum_e_x
  60. [2.862E-20, 2.862E-20, 4.248E-18, 0.999, 9.357E-14, 4.248E-18]
  61. ==== Example
  62. The following snippet calculates the percent of total sales for each month:
  63. [source,console]
  64. --------------------------------------------------
  65. POST /sales/_search
  66. {
  67. "size": 0,
  68. "aggs": {
  69. "sales_per_month": {
  70. "date_histogram": {
  71. "field": "date",
  72. "calendar_interval": "month"
  73. },
  74. "aggs": {
  75. "sales": {
  76. "sum": {
  77. "field": "price"
  78. }
  79. },
  80. "percent_of_total_sales": {
  81. "normalize": {
  82. "buckets_path": "sales", <1>
  83. "method": "percent_of_sum", <2>
  84. "format": "00.00%" <3>
  85. }
  86. }
  87. }
  88. }
  89. }
  90. }
  91. --------------------------------------------------
  92. // TEST[setup:sales]
  93. <1> `buckets_path` instructs this normalize aggregation to use the output of the `sales` aggregation for rescaling
  94. <2> `method` sets which rescaling to apply. In this case, `percent_of_sum` will calculate the sales value as a percent of all sales
  95. in the parent bucket
  96. <3> `format` influences how to format the metric as a string using Java's `DecimalFormat` pattern. In this case, multiplying by 100
  97. and adding a '%'
  98. And the following may be the response:
  99. [source,console-result]
  100. --------------------------------------------------
  101. {
  102. "took": 11,
  103. "timed_out": false,
  104. "_shards": ...,
  105. "hits": ...,
  106. "aggregations": {
  107. "sales_per_month": {
  108. "buckets": [
  109. {
  110. "key_as_string": "2015/01/01 00:00:00",
  111. "key": 1420070400000,
  112. "doc_count": 3,
  113. "sales": {
  114. "value": 550.0
  115. },
  116. "percent_of_total_sales": {
  117. "value": 0.5583756345177665,
  118. "value_as_string": "55.84%"
  119. }
  120. },
  121. {
  122. "key_as_string": "2015/02/01 00:00:00",
  123. "key": 1422748800000,
  124. "doc_count": 2,
  125. "sales": {
  126. "value": 60.0
  127. },
  128. "percent_of_total_sales": {
  129. "value": 0.06091370558375635,
  130. "value_as_string": "06.09%"
  131. }
  132. },
  133. {
  134. "key_as_string": "2015/03/01 00:00:00",
  135. "key": 1425168000000,
  136. "doc_count": 2,
  137. "sales": {
  138. "value": 375.0
  139. },
  140. "percent_of_total_sales": {
  141. "value": 0.38071065989847713,
  142. "value_as_string": "38.07%"
  143. }
  144. }
  145. ]
  146. }
  147. }
  148. }
  149. --------------------------------------------------
  150. // TESTRESPONSE[s/"took": 11/"took": $body.took/]
  151. // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
  152. // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]