123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228 |
- [role="xpack"]
- [[tutorial-manage-data-stream-retention]]
- === Tutorial: Data stream retention
- In this tutorial, we are going to go over the data stream lifecycle retention; we will define it, go over how it can be configured
- and how it can gets applied. Keep in mind, the following options apply only to data streams that are managed by the data stream lifecycle.
- . <<what-is-retention>>
- . <<retention-configuration>>
- . <<effective-retention-calculation>>
- . <<effective-retention-application>>
- You can verify if a data steam is managed by the data stream lifecycle via the <<data-streams-get-lifecycle,get data stream lifecycle API>>:
- ////
- [source,console]
- ----
- PUT /_index_template/template
- {
- "index_patterns": ["my-data-stream*"],
- "template": {
- "lifecycle": {}
- },
- "data_stream": { }
- }
- PUT /_data_stream/my-data-stream
- ----
- // TESTSETUP
- ////
- ////
- [source,console]
- ----
- DELETE /_data_stream/my-data-stream*
- DELETE /_index_template/template
- PUT /_cluster/settings
- {
- "persistent" : {
- "data_streams.lifecycle.retention.*" : null
- }
- }
- ----
- // TEARDOWN
- ////
- [source,console]
- --------------------------------------------------
- GET _data_stream/my-data-stream/_lifecycle
- --------------------------------------------------
- The result should look like this:
- [source,console-result]
- --------------------------------------------------
- {
- "data_streams": [
- {
- "name": "my-data-stream", <1>
- "lifecycle": {
- "enabled": true <2>
- }
- }
- ]
- }
- --------------------------------------------------
- // TESTRESPONSE[skip:the result is for illustrating purposes only]
- <1> The name of your data stream.
- <2> Ensure that the lifecycle is enabled, meaning this should be `true`.
- [discrete]
- [[what-is-retention]]
- ==== What is data stream retention?
- We define retention as the least amount of time the data of a data stream are going to be kept in {es}. After this time period
- has passed, {es} is allowed to remove these data to free up space and/or manage costs.
- NOTE: Retention does not define the period that the data will be removed, but the minimum time period they will be kept.
- We define 4 different types of retention:
- * The data stream retention, or `data_retention`, which is the retention configured on the data stream level. It can be
- set via an <<index-templates,index template>> for future data streams or via the <<data-streams-put-lifecycle, PUT data
- stream lifecycle API>> for an existing data stream. When the data stream retention is not set, it implies that the data
- need to be kept forever.
- * The global default retention, let's call it `default_retention`, which is a retention configured via the cluster setting
- <<data-streams-lifecycle-retention-default, `data_streams.lifecycle.retention.default`>> and will be
- applied to all data streams managed by data stream lifecycle that do not have `data_retention` configured. Effectively,
- it ensures that there will be no data streams keeping their data forever. This can be set via the
- <<cluster-update-settings, update cluster settings API>>.
- * The global max retention, let's call it `max_retention`, which is a retention configured via the cluster setting
- <<data-streams-lifecycle-retention-max, `data_streams.lifecycle.retention.max`>> and will be applied to
- all data streams managed by data stream lifecycle. Effectively, it ensures that there will be no data streams whose retention
- will exceed this time period. This can be set via the <<cluster-update-settings, update cluster settings API>>.
- * The effective retention, or `effective_retention`, which is the retention applied at a data stream on a given moment.
- Effective retention cannot be set, it is derived by taking into account all the configured retention listed above and is
- calculated as it is described <<effective-retention-calculation,here>>.
- NOTE: Global default and max retention do not apply to data streams internal to elastic. Internal data streams are recognised
- either by having the `system` flag set to `true` or if their name is prefixed with a dot (`.`).
- [discrete]
- [[retention-configuration]]
- ==== How to configure retention?
- - By setting the `data_retention` on the data stream level. This retention can be configured in two ways:
- +
- -- For new data streams, it can be defined in the index template that would be applied during the data stream's creation.
- You can use the <<indices-put-template,create index template API>>, for example:
- +
- [source,console]
- --------------------------------------------------
- PUT _index_template/template
- {
- "index_patterns": ["my-data-stream*"],
- "data_stream": { },
- "priority": 500,
- "template": {
- "lifecycle": {
- "data_retention": "7d"
- }
- },
- "_meta": {
- "description": "Template with data stream lifecycle"
- }
- }
- --------------------------------------------------
- -- For an existing data stream, it can be set via the <<data-streams-put-lifecycle, PUT lifecycle API>>.
- +
- [source,console]
- ----
- PUT _data_stream/my-data-stream/_lifecycle
- {
- "data_retention": "30d" <1>
- }
- ----
- // TEST[continued]
- <1> The retention period of this data stream is set to 30 days.
- - By setting the global retention via the `data_streams.lifecycle.retention.default` and/or `data_streams.lifecycle.retention.max`
- that are set on a cluster level. You can be set via the <<cluster-update-settings, update cluster settings API>>. For example:
- +
- [source,console]
- --------------------------------------------------
- PUT /_cluster/settings
- {
- "persistent" : {
- "data_streams.lifecycle.retention.default" : "7d",
- "data_streams.lifecycle.retention.max" : "90d"
- }
- }
- --------------------------------------------------
- // TEST[continued]
- [discrete]
- [[effective-retention-calculation]]
- ==== How is the effective retention calculated?
- The effective is calculated in the following way:
- - The `effective_retention` is the `default_retention`, when `default_retention` is defined and the data stream does not
- have `data_retention`.
- - The `effective_retention` is the `data_retention`, when `data_retention` is defined and if `max_retention` is defined,
- it is less than the `max_retention`.
- - The `effective_retention` is the `max_retention`, when `max_retention` is defined, and the data stream has either no
- `data_retention` or its `data_retention` is greater than the `max_retention`.
- The above is demonstrated in the examples below:
- |===
- |`default_retention` |`max_retention` |`data_retention` |`effective_retention` |Retention determined by
- |Not set |Not set |Not set |Infinite |N/A
- |Not relevant |12 months |**30 days** |30 days |`data_retention`
- |Not relevant |Not set |**30 days** |30 days |`data_retention`
- |**30 days** |12 months |Not set |30 days |`default_retention`
- |**30 days** |30 days |Not set |30 days |`default_retention`
- |Not relevant |**30 days** |12 months |30 days |`max_retention`
- |Not set |**30 days** |Not set |30 days |`max_retention`
- |===
- Considering our example, if we retrieve the lifecycle of `my-data-stream`:
- [source,console]
- ----
- GET _data_stream/my-data-stream/_lifecycle
- ----
- // TEST[continued]
- We see that it will remain the same with what the user configured:
- [source,console-result]
- ----
- {
- "global_retention" : {
- "max_retention" : "90d", <1>
- "default_retention" : "7d" <2>
- },
- "data_streams": [
- {
- "name": "my-data-stream",
- "lifecycle": {
- "enabled": true,
- "data_retention": "30d", <3>
- "effective_retention": "30d", <4>
- "retention_determined_by": "data_stream_configuration" <5>
- }
- }
- ]
- }
- ----
- <1> The maximum retention configured in the cluster.
- <2> The default retention configured in the cluster.
- <3> The requested retention for this data stream.
- <4> The retention that is applied by the data stream lifecycle on this data stream.
- <5> The configuration that determined the effective retention. In this case it's the `data_configuration` because
- it is less than the `max_retention`.
- [discrete]
- [[effective-retention-application]]
- ==== How is the effective retention applied?
- Retention is applied to the remaining backing indices of a data stream as the last step of
- <<data-streams-lifecycle-how-it-works, a data stream lifecycle run>>. Data stream lifecycle will retrieve the backing indices
- whose `generation_time` is longer than the effective retention period and delete them. The `generation_time` is only
- applicable to rolled over backing indices and it is either the time since the backing index got rolled over, or the time
- optionally configured in the <<index-data-stream-lifecycle-origination-date,`index.lifecycle.origination_date`>> setting.
- IMPORTANT: We use the `generation_time` instead of the creation time because this ensures that all data in the backing
- index have passed the retention period. As a result, the retention period is not the exact time data get deleted, but
- the minimum time data will be stored.
|