normalize-aggregation.asciidoc 5.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184
  1. [role="xpack"]
  2. [[search-aggregations-pipeline-normalize-aggregation]]
  3. === Normalize aggregation
  4. ++++
  5. <titleabbrev>Normalize</titleabbrev>
  6. ++++
  7. A parent pipeline aggregation which calculates the specific normalized/rescaled value for a specific bucket value.
  8. Values that cannot be normalized, will be skipped using the <<gap-policy, skip gap policy>>.
  9. ==== Syntax
  10. A `normalize` aggregation looks like this in isolation:
  11. [source,js]
  12. --------------------------------------------------
  13. {
  14. "normalize": {
  15. "buckets_path": "normalized",
  16. "method": "percent_of_sum"
  17. }
  18. }
  19. --------------------------------------------------
  20. // NOTCONSOLE
  21. [[normalize_pipeline-params]]
  22. .`normalize_pipeline` Parameters
  23. [options="header"]
  24. |===
  25. |Parameter Name |Description |Required |Default Value
  26. |`buckets_path` |The path to the buckets we wish to normalize (see <<buckets-path-syntax, `buckets_path` syntax>> for more details) |Required |
  27. |`method` | The specific <<normalize_pipeline-method, method>> to apply | Required |
  28. |`format` |format to apply to the output value of this aggregation |Optional |`null`
  29. |===
  30. ==== Methods
  31. [[normalize_pipeline-method]]
  32. The Normalize Aggregation supports multiple methods to transform the bucket values. Each method definition will use
  33. the following original set of bucket values as examples: `[5, 5, 10, 50, 10, 20]`.
  34. _rescale_0_1_::
  35. This method rescales the data such that the minimum number is zero, and the maximum number is 1, with the rest normalized
  36. linearly in-between.
  37. x' = (x - min_x) / (max_x - min_x)
  38. [0, 0, .1111, 1, .1111, .3333]
  39. _rescale_0_100_::
  40. This method rescales the data such that the minimum number is zero, and the maximum number is 100, with the rest normalized
  41. linearly in-between.
  42. x' = 100 * (x - min_x) / (max_x - min_x)
  43. [0, 0, 11.11, 100, 11.11, 33.33]
  44. _percent_of_sum_::
  45. This method normalizes each value so that it represents a percentage of the total sum it attributes to.
  46. x' = x / sum_x
  47. [5%, 5%, 10%, 50%, 10%, 20%]
  48. _mean_::
  49. This method normalizes such that each value is normalized by how much it differs from the average.
  50. x' = (x - mean_x) / (max_x - min_x)
  51. [4.63, 4.63, 9.63, 49.63, 9.63, 9.63, 19.63]
  52. _zscore_::
  53. This method normalizes such that each value represents how far it is from the mean relative to the standard deviation
  54. x' = (x - mean_x) / stdev_x
  55. [-0.68, -0.68, -0.39, 1.94, -0.39, 0.19]
  56. _softmax_::
  57. This method normalizes such that each value is exponentiated and relative to the sum of the exponents of the original values.
  58. x' = e^x / sum_e_x
  59. [2.862E-20, 2.862E-20, 4.248E-18, 0.999, 9.357E-14, 4.248E-18]
  60. ==== Example
  61. The following snippet calculates the percent of total sales for each month:
  62. [source,console]
  63. --------------------------------------------------
  64. POST /sales/_search
  65. {
  66. "size": 0,
  67. "aggs": {
  68. "sales_per_month": {
  69. "date_histogram": {
  70. "field": "date",
  71. "calendar_interval": "month"
  72. },
  73. "aggs": {
  74. "sales": {
  75. "sum": {
  76. "field": "price"
  77. }
  78. },
  79. "percent_of_total_sales": {
  80. "normalize": {
  81. "buckets_path": "sales", <1>
  82. "method": "percent_of_sum", <2>
  83. "format": "00.00%" <3>
  84. }
  85. }
  86. }
  87. }
  88. }
  89. }
  90. --------------------------------------------------
  91. // TEST[setup:sales]
  92. <1> `buckets_path` instructs this normalize aggregation to use the output of the `sales` aggregation for rescaling
  93. <2> `method` sets which rescaling to apply. In this case, `percent_of_sum` will calculate the sales value as a percent of all sales
  94. in the parent bucket
  95. <3> `format` influences how to format the metric as a string using Java's `DecimalFormat` pattern. In this case, multiplying by 100
  96. and adding a '%'
  97. And the following may be the response:
  98. [source,console-result]
  99. --------------------------------------------------
  100. {
  101. "took": 11,
  102. "timed_out": false,
  103. "_shards": ...,
  104. "hits": ...,
  105. "aggregations": {
  106. "sales_per_month": {
  107. "buckets": [
  108. {
  109. "key_as_string": "2015/01/01 00:00:00",
  110. "key": 1420070400000,
  111. "doc_count": 3,
  112. "sales": {
  113. "value": 550.0
  114. },
  115. "percent_of_total_sales": {
  116. "value": 0.5583756345177665,
  117. "value_as_string": "55.84%"
  118. }
  119. },
  120. {
  121. "key_as_string": "2015/02/01 00:00:00",
  122. "key": 1422748800000,
  123. "doc_count": 2,
  124. "sales": {
  125. "value": 60.0
  126. },
  127. "percent_of_total_sales": {
  128. "value": 0.06091370558375635,
  129. "value_as_string": "06.09%"
  130. }
  131. },
  132. {
  133. "key_as_string": "2015/03/01 00:00:00",
  134. "key": 1425168000000,
  135. "doc_count": 2,
  136. "sales": {
  137. "value": 375.0
  138. },
  139. "percent_of_total_sales": {
  140. "value": 0.38071065989847713,
  141. "value_as_string": "38.07%"
  142. }
  143. }
  144. ]
  145. }
  146. }
  147. }
  148. --------------------------------------------------
  149. // TESTRESPONSE[s/"took": 11/"took": $body.took/]
  150. // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
  151. // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]