tutorial-manage-data-stream-retention.asciidoc 7.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183
  1. [role="xpack"]
  2. [[tutorial-manage-data-stream-retention]]
  3. === Tutorial: Data stream retention
  4. preview::[]
  5. In this tutorial, we are going to go over the data stream lifecycle retention, define it, go over how it can be configured and how
  6. it can be applied. Keep in mind, the following options apply only to data streams that are managed by the data stream lifecycle.
  7. . <<what-is-retention>>
  8. . <<retention-configuration>>
  9. . <<effective-retention-calculation>>
  10. . <<effective-retention-application>>
  11. You can verify if a data steam is managed by the data stream lifecycle via the <<data-streams-get-lifecycle,get data stream lifecycle API>>:
  12. ////
  13. [source,console]
  14. ----
  15. PUT /_index_template/template
  16. {
  17. "index_patterns": ["my-data-stream*"],
  18. "template": {
  19. "lifecycle": {}
  20. },
  21. "data_stream": { }
  22. }
  23. PUT /_data_stream/my-data-stream
  24. ----
  25. // TESTSETUP
  26. ////
  27. ////
  28. [source,console]
  29. ----
  30. DELETE /_data_stream/my-data-stream*
  31. DELETE /_index_template/template
  32. DELETE /_data_stream/_global_retention
  33. ----
  34. // TEARDOWN
  35. ////
  36. [source,console]
  37. --------------------------------------------------
  38. GET _data_stream/my-data-stream/_lifecycle
  39. --------------------------------------------------
  40. The result should look like this:
  41. [source,console-result]
  42. --------------------------------------------------
  43. {
  44. "data_streams": [
  45. {
  46. "name": "my-data-stream", <1>
  47. "lifecycle": {
  48. "enabled": true <2>
  49. }
  50. }
  51. ]
  52. }
  53. --------------------------------------------------
  54. // TESTRESPONSE[skip:the result is for illustrating purposes only]
  55. <1> The name of your data stream.
  56. <2> Ensure that the lifecycle is enabled, meaning this should be `true`.
  57. [discrete]
  58. [[what-is-retention]]
  59. ==== What is data stream retention?
  60. We define retention as the least amount of time the data of a data stream are going to be kept in {es}. After this time period
  61. has passed, {es} is allowed to remove these data to free up space and/or manage costs.
  62. NOTE: Retention does not define the period that the data will be removed, but the minimum time period they will be kept.
  63. We define 4 different types of retention:
  64. * The data stream retention, or `data_retention`, which is the retention configured on the data stream level. It can be
  65. set via an <<index-templates,index template>> for future data streams or via the <<data-streams-put-lifecycle, PUT data
  66. stream lifecycle API>> for an existing data stream. When the data stream retention is not set, it implies that the data
  67. need to be kept forever.
  68. * The global default retention, or `default_retention`, which is a retention configured on a cluster level and will be
  69. applied to all data streams managed by data stream lifecycle that do not have `data_retention` configured. Effectively,
  70. it ensures that there will be no data streams keeping their data forever. This can be set via the
  71. <<data-streams-put-global-retention, PUT global retention API>>.
  72. * The global max retention, or `max_retention`, which is a retention configured on a cluster level and will be applied to
  73. all data streams managed by data stream lifecycle. Effectively, it ensures that there will be no data streams whose retention
  74. will exceed this time period. This can be set via the <<data-streams-put-global-retention, PUT global retention API>>.
  75. * The effective retention, or `effective_retention`, which is the retention applied at a data stream on a given moment.
  76. Effective retention cannot be set, it is derived by taking into account all the configured retention listed above and is
  77. calculated as it is described <<effective-retention-calculation,here>>.
  78. [discrete]
  79. [[retention-configuration]]
  80. ==== How to configure retention?
  81. - By setting the `data_retention` on the data stream level. This retention can be configured in two ways:
  82. +
  83. -- For new data streams, it can be defined in the index template that would be applied during the data stream's creation.
  84. You can use the <<indices-put-template,create index template API>>, for example:
  85. +
  86. [source,console]
  87. --------------------------------------------------
  88. PUT _index_template/template
  89. {
  90. "index_patterns": ["my-data-stream*"],
  91. "data_stream": { },
  92. "priority": 500,
  93. "template": {
  94. "lifecycle": {
  95. "data_retention": "7d"
  96. }
  97. },
  98. "_meta": {
  99. "description": "Template with data stream lifecycle"
  100. }
  101. }
  102. --------------------------------------------------
  103. -- For an existing data stream, it can be set via the <<data-streams-put-lifecycle, PUT lifecycle API>>.
  104. +
  105. [source,console]
  106. ----
  107. PUT _data_stream/my-data-stream/_lifecycle
  108. {
  109. "data_retention": "30d" <1>
  110. }
  111. ----
  112. // TEST[continued]
  113. <1> The retention period of this data stream is set to 30 days.
  114. - By setting the global retention via the `default_retention` and `max_retention` that are set on a cluster level. You
  115. can set them via the <<data-streams-put-global-retention, PUT global retention API>>. For example:
  116. +
  117. [source,console]
  118. --------------------------------------------------
  119. PUT _data_stream/_global_retention
  120. {
  121. "default_retention": "7d",
  122. "max_retention": "90d"
  123. }
  124. --------------------------------------------------
  125. // TEST[continued]
  126. [discrete]
  127. [[effective-retention-calculation]]
  128. ==== How is the effective retention calculated?
  129. The effective is calculated in the following way:
  130. - The `effective_retention` is the `default_retention`, when `default_retention` is defined and the data stream does not
  131. have `data_retention`.
  132. - The `effective_retention` is the `data_retention`, when `data_retention` is defined and if `max_retention` is defined,
  133. it is less than the `max_retention`.
  134. - The `effective_retention` is the `max_retention`, when `max_retention` is defined, and the data stream has either no
  135. `data_retention` or its `data_retention` is greater than the `max_retention`.
  136. The above is demonstrated in the examples below:
  137. |===
  138. |`default_retention` |`max_retention` |`data_retention` |`effective_retention` |Retention determined by
  139. |Not set |Not set |Not set |Infinite |N/A
  140. |Not relevant |12 months |**30 days** |30 days |`data_retention`
  141. |Not relevant |Not set |**30 days** |30 days |`data_retention`
  142. |**30 days** |12 months |Not set |30 days |`default_retention`
  143. |**30 days** |30 days |Not set |30 days |`default_retention`
  144. |Not relevant |**30 days** |12 months |30 days |`max_retention`
  145. |Not set |**30 days** |Not set |30 days |`max_retention`
  146. |===
  147. [discrete]
  148. [[effective-retention-application]]
  149. ==== How is the effective retention applied?
  150. Retention is applied to the remaining backing indices of a data stream as the last step of
  151. <<data-streams-lifecycle-how-it-works, a data stream lifecycle run>>. Data stream lifecycle will retrieve the backing indices
  152. whose `generation_time` is longer than the effective retention period and delete them. The `generation_time` is only
  153. applicable to rolled over backing indices and it is either the time since the backing index got rolled over, or the time
  154. optionally configured in the <<index-data-stream-lifecycle-origination-date,`index.lifecycle.origination_date`>> setting.
  155. IMPORTANT: We use the `generation_time` instead of the creation time because this ensures that all data in the backing
  156. index have passed the retention period. As a result, the retention period is not the exact time data get deleted, but
  157. the minimum time data will be stored.