normalize-aggregation.asciidoc 5.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[search-aggregations-pipeline-normalize-aggregation]]
  4. === Normalize Aggregation
  5. A parent pipeline aggregation which calculates the specific normalized/rescaled value for a specific bucket value.
  6. Values that cannot be normalized, will be skipped using the <<gap-policy, skip gap policy>>.
  7. ==== Syntax
  8. A `normalize` aggregation looks like this in isolation:
  9. [source,js]
  10. --------------------------------------------------
  11. {
  12. "normalize": {
  13. "buckets_path": "normalized",
  14. "method": "percent_of_sum"
  15. }
  16. }
  17. --------------------------------------------------
  18. // NOTCONSOLE
  19. [[normalize_pipeline-params]]
  20. .`normalize_pipeline` Parameters
  21. [options="header"]
  22. |===
  23. |Parameter Name |Description |Required |Default Value
  24. |`buckets_path` |The path to the buckets we wish to normalize (see <<buckets-path-syntax, `buckets_path` syntax>> for more details) |Required |
  25. |`method` | The specific <<normalize_pipeline-method, method>> to apply | Required |
  26. |`format` |format to apply to the output value of this aggregation |Optional |`null`
  27. |===
  28. ==== Methods
  29. [[normalize_pipeline-method]]
  30. The Normalize Aggregation supports multiple methods to transform the bucket values. Each method definition will use
  31. the following original set of bucket values as examples: `[5, 5, 10, 50, 10, 20]`.
  32. _rescale_0_1_::
  33. This method rescales the data such that the minimum number is zero, and the maximum number is 1, with the rest normalized
  34. linearly in-between.
  35. x' = (x - min_x) / (max_x - min_x)
  36. [0, 0, .1111, 1, .1111, .3333]
  37. _rescale_0_100_::
  38. This method rescales the data such that the minimum number is zero, and the maximum number is 1, with the rest normalized
  39. linearly in-between.
  40. x' = 100 * (x - min_x) / (max_x - min_x)
  41. [0, 0, 11.11, 100, 11.11, 33.33]
  42. _percent_of_sum_::
  43. This method normalizes each value so that it represents a percentage of the total sum it attributes to.
  44. x' = x / sum_x
  45. [5%, 5%, 10%, 50%, 10%, 20%]
  46. _mean_::
  47. This method normalizes such that each value is normalized by how much it differs from the average.
  48. x' = (x - mean_x) / (max_x - min_x)
  49. [4.63, 4.63, 9.63, 49.63, 9.63, 9.63, 19.63]
  50. _zscore_::
  51. This method normalizes such that each value represents how far it is from the mean relative to the standard deviation
  52. x' = (x - mean_x) / stdev_x
  53. [-0.68, -0.68, -0.39, 1.94, -0.39, 0.19]
  54. _softmax_::
  55. This method normalizes such that each value is exponentiated and relative to the sum of the exponents of the original values.
  56. x' = e^x / sum_e_x
  57. [2.862E-20, 2.862E-20, 4.248E-18, 0.999, 9.357E-14, 4.248E-18]
  58. ==== Example
  59. The following snippet calculates the percent of total sales for each month:
  60. [source,console]
  61. --------------------------------------------------
  62. POST /sales/_search
  63. {
  64. "size": 0,
  65. "aggs" : {
  66. "sales_per_month" : {
  67. "date_histogram" : {
  68. "field" : "date",
  69. "calendar_interval" : "month"
  70. },
  71. "aggs": {
  72. "sales": {
  73. "sum": {
  74. "field": "price"
  75. }
  76. },
  77. "percent_of_total_sales": {
  78. "normalize": {
  79. "buckets_path": "sales", <1>
  80. "method": "percent_of_sum", <2>
  81. "format": "00.00%" <3>
  82. }
  83. }
  84. }
  85. }
  86. }
  87. }
  88. --------------------------------------------------
  89. // TEST[setup:sales]
  90. <1> `buckets_path` instructs this normalize aggregation to use the output of the `sales` aggregation for rescaling
  91. <2> `method` sets which rescaling to apply. In this case, `percent_of_sum` will calculate the sales value as a percent of all sales
  92. in the parent bucket
  93. <3> `format` influences how to format the metric as a string using Java's `DecimalFormat` pattern. In this case, multiplying by 100
  94. and adding a '%'
  95. And the following may be the response:
  96. [source,console-result]
  97. --------------------------------------------------
  98. {
  99. "took": 11,
  100. "timed_out": false,
  101. "_shards": ...,
  102. "hits": ...,
  103. "aggregations": {
  104. "sales_per_month": {
  105. "buckets": [
  106. {
  107. "key_as_string": "2015/01/01 00:00:00",
  108. "key": 1420070400000,
  109. "doc_count": 3,
  110. "sales": {
  111. "value": 550.0
  112. },
  113. "percent_of_total_sales": {
  114. "value": 0.5583756345177665,
  115. "value_as_string": "55.84%"
  116. }
  117. },
  118. {
  119. "key_as_string": "2015/02/01 00:00:00",
  120. "key": 1422748800000,
  121. "doc_count": 2,
  122. "sales": {
  123. "value": 60.0
  124. },
  125. "percent_of_total_sales": {
  126. "value": 0.06091370558375635,
  127. "value_as_string": "06.09%"
  128. }
  129. },
  130. {
  131. "key_as_string": "2015/03/01 00:00:00",
  132. "key": 1425168000000,
  133. "doc_count": 2,
  134. "sales": {
  135. "value": 375.0
  136. },
  137. "percent_of_total_sales": {
  138. "value": 0.38071065989847713,
  139. "value_as_string": "38.07%"
  140. }
  141. }
  142. ]
  143. }
  144. }
  145. }
  146. --------------------------------------------------
  147. // TESTRESPONSE[s/"took": 11/"took": $body.took/]
  148. // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
  149. // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]