123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184 |
- [role="xpack"]
- [[search-aggregations-pipeline-normalize-aggregation]]
- === Normalize aggregation
- ++++
- <titleabbrev>Normalize</titleabbrev>
- ++++
- A parent pipeline aggregation which calculates the specific normalized/rescaled value for a specific bucket value.
- Values that cannot be normalized, will be skipped using the <<gap-policy, skip gap policy>>.
- ==== Syntax
- A `normalize` aggregation looks like this in isolation:
- [source,js]
- --------------------------------------------------
- {
- "normalize": {
- "buckets_path": "normalized",
- "method": "percent_of_sum"
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- [[normalize_pipeline-params]]
- .`normalize_pipeline` Parameters
- [options="header"]
- |===
- |Parameter Name |Description |Required |Default Value
- |`buckets_path` |The path to the buckets we wish to normalize (see <<buckets-path-syntax, `buckets_path` syntax>> for more details) |Required |
- |`method` | The specific <<normalize_pipeline-method, method>> to apply | Required |
- |`format` |format to apply to the output value of this aggregation |Optional |`null`
- |===
- ==== Methods
- [[normalize_pipeline-method]]
- The Normalize Aggregation supports multiple methods to transform the bucket values. Each method definition will use
- the following original set of bucket values as examples: `[5, 5, 10, 50, 10, 20]`.
- _rescale_0_1_::
- This method rescales the data such that the minimum number is zero, and the maximum number is 1, with the rest normalized
- linearly in-between.
- x' = (x - min_x) / (max_x - min_x)
- [0, 0, .1111, 1, .1111, .3333]
- _rescale_0_100_::
- This method rescales the data such that the minimum number is zero, and the maximum number is 100, with the rest normalized
- linearly in-between.
- x' = 100 * (x - min_x) / (max_x - min_x)
- [0, 0, 11.11, 100, 11.11, 33.33]
- _percent_of_sum_::
- This method normalizes each value so that it represents a percentage of the total sum it attributes to.
- x' = x / sum_x
- [5%, 5%, 10%, 50%, 10%, 20%]
- _mean_::
- This method normalizes such that each value is normalized by how much it differs from the average.
- x' = (x - mean_x) / (max_x - min_x)
- [4.63, 4.63, 9.63, 49.63, 9.63, 9.63, 19.63]
- _zscore_::
- This method normalizes such that each value represents how far it is from the mean relative to the standard deviation
- x' = (x - mean_x) / stdev_x
- [-0.68, -0.68, -0.39, 1.94, -0.39, 0.19]
- _softmax_::
- This method normalizes such that each value is exponentiated and relative to the sum of the exponents of the original values.
- x' = e^x / sum_e_x
- [2.862E-20, 2.862E-20, 4.248E-18, 0.999, 9.357E-14, 4.248E-18]
- ==== Example
- The following snippet calculates the percent of total sales for each month:
- [source,console]
- --------------------------------------------------
- POST /sales/_search
- {
- "size": 0,
- "aggs": {
- "sales_per_month": {
- "date_histogram": {
- "field": "date",
- "calendar_interval": "month"
- },
- "aggs": {
- "sales": {
- "sum": {
- "field": "price"
- }
- },
- "percent_of_total_sales": {
- "normalize": {
- "buckets_path": "sales", <1>
- "method": "percent_of_sum", <2>
- "format": "00.00%" <3>
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:sales]
- <1> `buckets_path` instructs this normalize aggregation to use the output of the `sales` aggregation for rescaling
- <2> `method` sets which rescaling to apply. In this case, `percent_of_sum` will calculate the sales value as a percent of all sales
- in the parent bucket
- <3> `format` influences how to format the metric as a string using Java's `DecimalFormat` pattern. In this case, multiplying by 100
- and adding a '%'
- And the following may be the response:
- [source,console-result]
- --------------------------------------------------
- {
- "took": 11,
- "timed_out": false,
- "_shards": ...,
- "hits": ...,
- "aggregations": {
- "sales_per_month": {
- "buckets": [
- {
- "key_as_string": "2015/01/01 00:00:00",
- "key": 1420070400000,
- "doc_count": 3,
- "sales": {
- "value": 550.0
- },
- "percent_of_total_sales": {
- "value": 0.5583756345177665,
- "value_as_string": "55.84%"
- }
- },
- {
- "key_as_string": "2015/02/01 00:00:00",
- "key": 1422748800000,
- "doc_count": 2,
- "sales": {
- "value": 60.0
- },
- "percent_of_total_sales": {
- "value": 0.06091370558375635,
- "value_as_string": "06.09%"
- }
- },
- {
- "key_as_string": "2015/03/01 00:00:00",
- "key": 1425168000000,
- "doc_count": 2,
- "sales": {
- "value": 375.0
- },
- "percent_of_total_sales": {
- "value": 0.38071065989847713,
- "value_as_string": "38.07%"
- }
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 11/"took": $body.took/]
- // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
- // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
|