123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661 |
- [[search-aggregations-pipeline-movavg-aggregation]]
- === Moving Average Aggregation
- deprecated[6.4.0, The Moving Average aggregation has been deprecated in favor of the more general
- <<search-aggregations-pipeline-movfn-aggregation,Moving Function Aggregation>>. The new Moving Function aggregation provides
- all the same functionality as the Moving Average aggregation, but also provides more flexibility.]
- Given an ordered series of data, the Moving Average aggregation will slide a window across the data and emit the average
- value of that window. For example, given the data `[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]`, we can calculate a simple moving
- average with windows size of `5` as follows:
- - (1 + 2 + 3 + 4 + 5) / 5 = 3
- - (2 + 3 + 4 + 5 + 6) / 5 = 4
- - (3 + 4 + 5 + 6 + 7) / 5 = 5
- - etc
- Moving averages are a simple method to smooth sequential data. Moving averages are typically applied to time-based data,
- such as stock prices or server metrics. The smoothing can be used to eliminate high frequency fluctuations or random noise,
- which allows the lower frequency trends to be more easily visualized, such as seasonality.
- ==== Syntax
- A `moving_avg` aggregation looks like this in isolation:
- [source,js]
- --------------------------------------------------
- {
- "moving_avg": {
- "buckets_path": "the_sum",
- "model": "holt",
- "window": 5,
- "gap_policy": "insert_zeros",
- "settings": {
- "alpha": 0.8
- }
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- .`moving_avg` Parameters
- |===
- |Parameter Name |Description |Required |Default Value
- |`buckets_path` |Path to the metric of interest (see <<buckets-path-syntax, `buckets_path` Syntax>> for more details |Required |
- |`model` |The moving average weighting model that we wish to use |Optional |`simple`
- |`gap_policy` |Determines what should happen when a gap in the data is encountered. |Optional |`insert_zeros`
- |`window` |The size of window to "slide" across the histogram. |Optional |`5`
- |`minimize` |If the model should be algorithmically minimized. See <<movavg-minimizer, Minimization>> for more
- details |Optional |`false` for most models
- |`settings` |Model-specific settings, contents which differ depending on the model specified. |Optional |
- |===
- `moving_avg` aggregations must be embedded inside of a `histogram` or `date_histogram` aggregation. They can be
- embedded like any other metric aggregation:
- [source,js]
- --------------------------------------------------
- POST /_search
- {
- "size": 0,
- "aggs": {
- "my_date_histo":{ <1>
- "date_histogram":{
- "field":"date",
- "interval":"1M"
- },
- "aggs":{
- "the_sum":{
- "sum":{ "field": "price" } <2>
- },
- "the_movavg":{
- "moving_avg":{ "buckets_path": "the_sum" } <3>
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- // TEST[warning:The moving_avg aggregation has been deprecated in favor of the moving_fn aggregation.]
- <1> A `date_histogram` named "my_date_histo" is constructed on the "timestamp" field, with one-day intervals
- <2> A `sum` metric is used to calculate the sum of a field. This could be any metric (sum, min, max, etc)
- <3> Finally, we specify a `moving_avg` aggregation which uses "the_sum" metric as its input.
- Moving averages are built by first specifying a `histogram` or `date_histogram` over a field. You can then optionally
- add normal metrics, such as a `sum`, inside of that histogram. Finally, the `moving_avg` is embedded inside the histogram.
- The `buckets_path` parameter is then used to "point" at one of the sibling metrics inside of the histogram (see
- <<buckets-path-syntax>> for a description of the syntax for `buckets_path`.
- An example response from the above aggregation may look like:
- [source,js]
- --------------------------------------------------
- {
- "took": 11,
- "timed_out": false,
- "_shards": ...,
- "hits": ...,
- "aggregations": {
- "my_date_histo": {
- "buckets": [
- {
- "key_as_string": "2015/01/01 00:00:00",
- "key": 1420070400000,
- "doc_count": 3,
- "the_sum": {
- "value": 550.0
- }
- },
- {
- "key_as_string": "2015/02/01 00:00:00",
- "key": 1422748800000,
- "doc_count": 2,
- "the_sum": {
- "value": 60.0
- },
- "the_movavg": {
- "value": 550.0
- }
- },
- {
- "key_as_string": "2015/03/01 00:00:00",
- "key": 1425168000000,
- "doc_count": 2,
- "the_sum": {
- "value": 375.0
- },
- "the_movavg": {
- "value": 305.0
- }
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/"took": 11/"took": $body.took/]
- // TESTRESPONSE[s/"_shards": \.\.\./"_shards": $body._shards/]
- // TESTRESPONSE[s/"hits": \.\.\./"hits": $body.hits/]
- ==== Models
- The `moving_avg` aggregation includes four different moving average "models". The main difference is how the values in the
- window are weighted. As data-points become "older" in the window, they may be weighted differently. This will
- affect the final average for that window.
- Models are specified using the `model` parameter. Some models may have optional configurations which are specified inside
- the `settings` parameter.
- ===== Simple
- The `simple` model calculates the sum of all values in the window, then divides by the size of the window. It is effectively
- a simple arithmetic mean of the window. The simple model does not perform any time-dependent weighting, which means
- the values from a `simple` moving average tend to "lag" behind the real data.
- [source,js]
- --------------------------------------------------
- POST /_search
- {
- "size": 0,
- "aggs": {
- "my_date_histo":{
- "date_histogram":{
- "field":"date",
- "interval":"1M"
- },
- "aggs":{
- "the_sum":{
- "sum":{ "field": "price" }
- },
- "the_movavg":{
- "moving_avg":{
- "buckets_path": "the_sum",
- "window" : 30,
- "model" : "simple"
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- // TEST[warning:The moving_avg aggregation has been deprecated in favor of the moving_fn aggregation.]
- A `simple` model has no special settings to configure
- The window size can change the behavior of the moving average. For example, a small window (`"window": 10`) will closely
- track the data and only smooth out small scale fluctuations:
- [[movavg_10window]]
- .Moving average with window of size 10
- image::images/pipeline_movavg/movavg_10window.png[]
- In contrast, a `simple` moving average with larger window (`"window": 100`) will smooth out all higher-frequency fluctuations,
- leaving only low-frequency, long term trends. It also tends to "lag" behind the actual data by a substantial amount:
- [[movavg_100window]]
- .Moving average with window of size 100
- image::images/pipeline_movavg/movavg_100window.png[]
- ==== Linear
- The `linear` model assigns a linear weighting to points in the series, such that "older" datapoints (e.g. those at
- the beginning of the window) contribute a linearly less amount to the total average. The linear weighting helps reduce
- the "lag" behind the data's mean, since older points have less influence.
- [source,js]
- --------------------------------------------------
- POST /_search
- {
- "size": 0,
- "aggs": {
- "my_date_histo":{
- "date_histogram":{
- "field":"date",
- "interval":"1M"
- },
- "aggs":{
- "the_sum":{
- "sum":{ "field": "price" }
- },
- "the_movavg": {
- "moving_avg":{
- "buckets_path": "the_sum",
- "window" : 30,
- "model" : "linear"
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- // TEST[warning:The moving_avg aggregation has been deprecated in favor of the moving_fn aggregation.]
- A `linear` model has no special settings to configure
- Like the `simple` model, window size can change the behavior of the moving average. For example, a small window (`"window": 10`)
- will closely track the data and only smooth out small scale fluctuations:
- [[linear_10window]]
- .Linear moving average with window of size 10
- image::images/pipeline_movavg/linear_10window.png[]
- In contrast, a `linear` moving average with larger window (`"window": 100`) will smooth out all higher-frequency fluctuations,
- leaving only low-frequency, long term trends. It also tends to "lag" behind the actual data by a substantial amount,
- although typically less than the `simple` model:
- [[linear_100window]]
- .Linear moving average with window of size 100
- image::images/pipeline_movavg/linear_100window.png[]
- ==== EWMA (Exponentially Weighted)
- The `ewma` model (aka "single-exponential") is similar to the `linear` model, except older data-points become exponentially less important,
- rather than linearly less important. The speed at which the importance decays can be controlled with an `alpha`
- setting. Small values make the weight decay slowly, which provides greater smoothing and takes into account a larger
- portion of the window. Larger valuers make the weight decay quickly, which reduces the impact of older values on the
- moving average. This tends to make the moving average track the data more closely but with less smoothing.
- The default value of `alpha` is `0.3`, and the setting accepts any float from 0-1 inclusive.
- The EWMA model can be <<movavg-minimizer, Minimized>>
- [source,js]
- --------------------------------------------------
- POST /_search
- {
- "size": 0,
- "aggs": {
- "my_date_histo":{
- "date_histogram":{
- "field":"date",
- "interval":"1M"
- },
- "aggs":{
- "the_sum":{
- "sum":{ "field": "price" }
- },
- "the_movavg": {
- "moving_avg":{
- "buckets_path": "the_sum",
- "window" : 30,
- "model" : "ewma",
- "settings" : {
- "alpha" : 0.5
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- // TEST[warning:The moving_avg aggregation has been deprecated in favor of the moving_fn aggregation.]
- [[single_0.2alpha]]
- .EWMA with window of size 10, alpha = 0.2
- image::images/pipeline_movavg/single_0.2alpha.png[]
- [[single_0.7alpha]]
- .EWMA with window of size 10, alpha = 0.7
- image::images/pipeline_movavg/single_0.7alpha.png[]
- ==== Holt-Linear
- The `holt` model (aka "double exponential") incorporates a second exponential term which
- tracks the data's trend. Single exponential does not perform well when the data has an underlying linear trend. The
- double exponential model calculates two values internally: a "level" and a "trend".
- The level calculation is similar to `ewma`, and is an exponentially weighted view of the data. The difference is
- that the previously smoothed value is used instead of the raw value, which allows it to stay close to the original series.
- The trend calculation looks at the difference between the current and last value (e.g. the slope, or trend, of the
- smoothed data). The trend value is also exponentially weighted.
- Values are produced by multiplying the level and trend components.
- The default value of `alpha` is `0.3` and `beta` is `0.1`. The settings accept any float from 0-1 inclusive.
- The Holt-Linear model can be <<movavg-minimizer, Minimized>>
- [source,js]
- --------------------------------------------------
- POST /_search
- {
- "size": 0,
- "aggs": {
- "my_date_histo":{
- "date_histogram":{
- "field":"date",
- "interval":"1M"
- },
- "aggs":{
- "the_sum":{
- "sum":{ "field": "price" }
- },
- "the_movavg": {
- "moving_avg":{
- "buckets_path": "the_sum",
- "window" : 30,
- "model" : "holt",
- "settings" : {
- "alpha" : 0.5,
- "beta" : 0.5
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- // TEST[warning:The moving_avg aggregation has been deprecated in favor of the moving_fn aggregation.]
- In practice, the `alpha` value behaves very similarly in `holt` as `ewma`: small values produce more smoothing
- and more lag, while larger values produce closer tracking and less lag. The value of `beta` is often difficult
- to see. Small values emphasize long-term trends (such as a constant linear trend in the whole series), while larger
- values emphasize short-term trends. This will become more apparently when you are predicting values.
- [[double_0.2beta]]
- .Holt-Linear moving average with window of size 100, alpha = 0.5, beta = 0.2
- image::images/pipeline_movavg/double_0.2beta.png[]
- [[double_0.7beta]]
- .Holt-Linear moving average with window of size 100, alpha = 0.5, beta = 0.7
- image::images/pipeline_movavg/double_0.7beta.png[]
- ==== Holt-Winters
- The `holt_winters` model (aka "triple exponential") incorporates a third exponential term which
- tracks the seasonal aspect of your data. This aggregation therefore smooths based on three components: "level", "trend"
- and "seasonality".
- The level and trend calculation is identical to `holt` The seasonal calculation looks at the difference between
- the current point, and the point one period earlier.
- Holt-Winters requires a little more handholding than the other moving averages. You need to specify the "periodicity"
- of your data: e.g. if your data has cyclic trends every 7 days, you would set `period: 7`. Similarly if there was
- a monthly trend, you would set it to `30`. There is currently no periodicity detection, although that is planned
- for future enhancements.
- There are two varieties of Holt-Winters: additive and multiplicative.
- ===== "Cold Start"
- Unfortunately, due to the nature of Holt-Winters, it requires two periods of data to "bootstrap" the algorithm. This
- means that your `window` must always be *at least* twice the size of your period. An exception will be thrown if it
- isn't. It also means that Holt-Winters will not emit a value for the first `2 * period` buckets; the current algorithm
- does not backcast.
- [[holt_winters_cold_start]]
- .Holt-Winters showing a "cold" start where no values are emitted
- image::images/pipeline_movavg/triple_untruncated.png[]
- Because the "cold start" obscures what the moving average looks like, the rest of the Holt-Winters images are truncated
- to not show the "cold start". Just be aware this will always be present at the beginning of your moving averages!
- ===== Additive Holt-Winters
- Additive seasonality is the default; it can also be specified by setting `"type": "add"`. This variety is preferred
- when the seasonal affect is additive to your data. E.g. you could simply subtract the seasonal effect to "de-seasonalize"
- your data into a flat trend.
- The default values of `alpha` and `gamma` are `0.3` while `beta` is `0.1`. The settings accept any float from 0-1 inclusive.
- The default value of `period` is `1`.
- The additive Holt-Winters model can be <<movavg-minimizer, Minimized>>
- [source,js]
- --------------------------------------------------
- POST /_search
- {
- "size": 0,
- "aggs": {
- "my_date_histo":{
- "date_histogram":{
- "field":"date",
- "interval":"1M"
- },
- "aggs":{
- "the_sum":{
- "sum":{ "field": "price" }
- },
- "the_movavg": {
- "moving_avg":{
- "buckets_path": "the_sum",
- "window" : 30,
- "model" : "holt_winters",
- "settings" : {
- "type" : "add",
- "alpha" : 0.5,
- "beta" : 0.5,
- "gamma" : 0.5,
- "period" : 7
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- // TEST[warning:The moving_avg aggregation has been deprecated in favor of the moving_fn aggregation.]
- [[holt_winters_add]]
- .Holt-Winters moving average with window of size 120, alpha = 0.5, beta = 0.7, gamma = 0.3, period = 30
- image::images/pipeline_movavg/triple.png[]
- ===== Multiplicative Holt-Winters
- Multiplicative is specified by setting `"type": "mult"`. This variety is preferred when the seasonal affect is
- multiplied against your data. E.g. if the seasonal affect is x5 the data, rather than simply adding to it.
- The default values of `alpha` and `gamma` are `0.3` while `beta` is `0.1`. The settings accept any float from 0-1 inclusive.
- The default value of `period` is `1`.
- The multiplicative Holt-Winters model can be <<movavg-minimizer, Minimized>>
- [WARNING]
- ======
- Multiplicative Holt-Winters works by dividing each data point by the seasonal value. This is problematic if any of
- your data is zero, or if there are gaps in the data (since this results in a divid-by-zero). To combat this, the
- `mult` Holt-Winters pads all values by a very small amount (1*10^-10^) so that all values are non-zero. This affects
- the result, but only minimally. If your data is non-zero, or you prefer to see `NaN` when zero's are encountered,
- you can disable this behavior with `pad: false`
- ======
- [source,js]
- --------------------------------------------------
- POST /_search
- {
- "size": 0,
- "aggs": {
- "my_date_histo":{
- "date_histogram":{
- "field":"date",
- "interval":"1M"
- },
- "aggs":{
- "the_sum":{
- "sum":{ "field": "price" }
- },
- "the_movavg": {
- "moving_avg":{
- "buckets_path": "the_sum",
- "window" : 30,
- "model" : "holt_winters",
- "settings" : {
- "type" : "mult",
- "alpha" : 0.5,
- "beta" : 0.5,
- "gamma" : 0.5,
- "period" : 7,
- "pad" : true
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- // TEST[warning:The moving_avg aggregation has been deprecated in favor of the moving_fn aggregation.]
- ==== Prediction
- experimental[]
- All the moving average model support a "prediction" mode, which will attempt to extrapolate into the future given the
- current smoothed, moving average. Depending on the model and parameter, these predictions may or may not be accurate.
- Predictions are enabled by adding a `predict` parameter to any moving average aggregation, specifying the number of
- predictions you would like appended to the end of the series. These predictions will be spaced out at the same interval
- as your buckets:
- [source,js]
- --------------------------------------------------
- POST /_search
- {
- "size": 0,
- "aggs": {
- "my_date_histo":{
- "date_histogram":{
- "field":"date",
- "interval":"1M"
- },
- "aggs":{
- "the_sum":{
- "sum":{ "field": "price" }
- },
- "the_movavg": {
- "moving_avg":{
- "buckets_path": "the_sum",
- "window" : 30,
- "model" : "simple",
- "predict" : 10
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- // TEST[warning:The moving_avg aggregation has been deprecated in favor of the moving_fn aggregation.]
- The `simple`, `linear` and `ewma` models all produce "flat" predictions: they essentially converge on the mean
- of the last value in the series, producing a flat:
- [[simple_prediction]]
- .Simple moving average with window of size 10, predict = 50
- image::images/pipeline_movavg/simple_prediction.png[]
- In contrast, the `holt` model can extrapolate based on local or global constant trends. If we set a high `beta`
- value, we can extrapolate based on local constant trends (in this case the predictions head down, because the data at the end
- of the series was heading in a downward direction):
- [[double_prediction_local]]
- .Holt-Linear moving average with window of size 100, predict = 20, alpha = 0.5, beta = 0.8
- image::images/pipeline_movavg/double_prediction_local.png[]
- In contrast, if we choose a small `beta`, the predictions are based on the global constant trend. In this series, the
- global trend is slightly positive, so the prediction makes a sharp u-turn and begins a positive slope:
- [[double_prediction_global]]
- .Double Exponential moving average with window of size 100, predict = 20, alpha = 0.5, beta = 0.1
- image::images/pipeline_movavg/double_prediction_global.png[]
- The `holt_winters` model has the potential to deliver the best predictions, since it also incorporates seasonal
- fluctuations into the model:
- [[holt_winters_prediction_global]]
- .Holt-Winters moving average with window of size 120, predict = 25, alpha = 0.8, beta = 0.2, gamma = 0.7, period = 30
- image::images/pipeline_movavg/triple_prediction.png[]
- [[movavg-minimizer]]
- ==== Minimization
- Some of the models (EWMA, Holt-Linear, Holt-Winters) require one or more parameters to be configured. Parameter choice
- can be tricky and sometimes non-intuitive. Furthermore, small deviations in these parameters can sometimes have a drastic
- effect on the output moving average.
- For that reason, the three "tunable" models can be algorithmically *minimized*. Minimization is a process where parameters
- are tweaked until the predictions generated by the model closely match the output data. Minimization is not fullproof
- and can be susceptible to overfitting, but it often gives better results than hand-tuning.
- Minimization is disabled by default for `ewma` and `holt_linear`, while it is enabled by default for `holt_winters`.
- Minimization is most useful with Holt-Winters, since it helps improve the accuracy of the predictions. EWMA and
- Holt-Linear are not great predictors, and mostly used for smoothing data, so minimization is less useful on those
- models.
- Minimization is enabled/disabled via the `minimize` parameter:
- [source,js]
- --------------------------------------------------
- POST /_search
- {
- "size": 0,
- "aggs": {
- "my_date_histo":{
- "date_histogram":{
- "field":"date",
- "interval":"1M"
- },
- "aggs":{
- "the_sum":{
- "sum":{ "field": "price" }
- },
- "the_movavg": {
- "moving_avg":{
- "buckets_path": "the_sum",
- "model" : "holt_winters",
- "window" : 30,
- "minimize" : true, <1>
- "settings" : {
- "period" : 7
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:sales]
- // TEST[warning:The moving_avg aggregation has been deprecated in favor of the moving_fn aggregation.]
- <1> Minimization is enabled with the `minimize` parameter
- When enabled, minimization will find the optimal values for `alpha`, `beta` and `gamma`. The user should still provide
- appropriate values for `window`, `period` and `type`.
- [WARNING]
- ======
- Minimization works by running a stochastic process called *simulated annealing*. This process will usually generate
- a good solution, but is not guaranteed to find the global optimum. It also requires some amount of additional
- computational power, since the model needs to be re-run multiple times as the values are tweaked. The run-time of
- minimization is linear to the size of the window being processed: excessively large windows may cause latency.
- Finally, minimization fits the model to the last `n` values, where `n = window`. This generally produces
- better forecasts into the future, since the parameters are tuned around the end of the series. It can, however, generate
- poorer fitting moving averages at the beginning of the series.
- ======
|