tsds-reindex.asciidoc 8.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292
  1. [[tsds-reindex]]
  2. === Reindex a time series data stream (TSDS)
  3. ++++
  4. <titleabbrev>Reindex a TSDS</titleabbrev>
  5. ++++
  6. [discrete]
  7. [[tsds-reindex-intro]]
  8. ==== Introduction
  9. With reindexing, you can copy documents from an old <<tsds,time-series data stream (TSDS)>> to a new one. Data streams support
  10. reindexing in general, with a few <<reindex-with-a-data-stream, restrictions>>. Still, time-series data streams
  11. introduce additional challenges due to tight control on the accepted timestamp range for each backing index they
  12. contain. Direct use of the reindex API would likely error out due to attempting to insert documents with timestamps that are
  13. outside the current acceptance window.
  14. To avoid these limitations, use the process that is outlined below:
  15. . Create an index template for the destination data stream that will contain the re-indexed data.
  16. . Update the template to
  17. .. Set `index.time_series.start_time` and `index.time_series.end_time` index settings to
  18. match the lowest and highest `@timestamp` values in the old data stream.
  19. .. Set the `index.number_of_shards` index setting to the sum of all primary shards of all backing
  20. indices of the old data stream.
  21. .. Set `index.number_of_replicas` to zero and unset the `index.lifecycle.name` index setting.
  22. . Run the reindex operation to completion.
  23. . Revert the overriden index settings in the destination index template.
  24. . Invoke the `rollover` api to create a new backing index that can receive new documents.
  25. NOTE: This process only applies to time-series data streams without <<downsampling, downsampling>> configuration. Data
  26. streams with downsampling can only be re-indexed by re-indexing their backing indexes individually and adding them to an
  27. empty destination data stream.
  28. In what follows, we elaborate on each step of the process with examples.
  29. [discrete]
  30. [[tsds-reindex-create-template]]
  31. ==== Create a TSDS template to accept old documents
  32. Consider a TSDS with the following template:
  33. [source,console]
  34. ----
  35. POST /_component_template/source_template
  36. {
  37. "template": {
  38. "settings": {
  39. "index": {
  40. "number_of_replicas": 2,
  41. "number_of_shards": 2,
  42. "mode": "time_series",
  43. "routing_path": [ "metricset" ]
  44. }
  45. },
  46. "mappings": {
  47. "properties": {
  48. "@timestamp": { "type": "date" },
  49. "metricset": {
  50. "type": "keyword",
  51. "time_series_dimension": true
  52. },
  53. "k8s": {
  54. "properties": {
  55. "tx": { "type": "long" },
  56. "rx": { "type": "long" }
  57. }
  58. }
  59. }
  60. }
  61. }
  62. }
  63. POST /_index_template/1
  64. {
  65. "index_patterns": [
  66. "k8s*"
  67. ],
  68. "composed_of": [
  69. "source_template"
  70. ],
  71. "data_stream": {}
  72. }
  73. ----
  74. // TEST[skip: not expected to match the sample below]
  75. A possible output of `/k8s/_settings` looks like:
  76. [source,console-result]
  77. ----
  78. {
  79. ".ds-k8s-2023.09.01-000002": {
  80. "settings": {
  81. "index": {
  82. "mode": "time_series",
  83. "routing": {
  84. "allocation": {
  85. "include": {
  86. "_tier_preference": "data_hot"
  87. }
  88. }
  89. },
  90. "hidden": "true",
  91. "number_of_shards": "2",
  92. "time_series": {
  93. "end_time": "2023-09-01T14:00:00.000Z",
  94. "start_time": "2023-09-01T10:00:00.000Z"
  95. },
  96. "provided_name": ".ds-k9s-2023.09.01-000002",
  97. "creation_date": "1694439857608",
  98. "number_of_replicas": "2",
  99. "routing_path": [
  100. "metricset"
  101. ],
  102. ...
  103. }
  104. }
  105. },
  106. ".ds-k8s-2023.09.01-000001": {
  107. "settings": {
  108. "index": {
  109. "mode": "time_series",
  110. "routing": {
  111. "allocation": {
  112. "include": {
  113. "_tier_preference": "data_hot"
  114. }
  115. }
  116. },
  117. "hidden": "true",
  118. "number_of_shards": "2",
  119. "time_series": {
  120. "end_time": "2023-09-01T10:00:00.000Z",
  121. "start_time": "2023-09-01T06:00:00.000Z"
  122. },
  123. "provided_name": ".ds-k9s-2023.09.01-000001",
  124. "creation_date": "1694439837126",
  125. "number_of_replicas": "2",
  126. "routing_path": [
  127. "metricset"
  128. ],
  129. ...
  130. }
  131. }
  132. }
  133. }
  134. ----
  135. // NOTCONSOLE
  136. To reindex this TSDS, do not to re-use its index template in the destination data stream, to avoid impacting its
  137. functionality. Instead, clone the template of the source TSDS and apply the following modifications:
  138. * Set `index.time_series.start_time` and `index.time_series.end_time` index settings explicitly. Their values should be
  139. based on the lowest and highest `@timestamp` values in the data stream to reindex. This way, the initial backing index can
  140. load all data that is contained in the source data stream.
  141. * Set `index.number_of_shards` index setting to the sum of all primary shards of all backing indices of the source data
  142. stream. This helps maintain the same level of search parallelism, as each shard is processed in a separate thread (or
  143. more).
  144. * Unset the `index.lifecycle.name` index setting, if any. This prevents ILM from modifying the destination data stream
  145. during reindexing.
  146. * (Optional) Set `index.number_of_replicas` to zero. This helps speed up the reindex operation. Since the data gets
  147. copied, there is limited risk of data loss due to lack of replicas.
  148. Using the example above as source TSDS, the template for the destination TSDS would be:
  149. [source,console]
  150. ----
  151. POST /_component_template/destination_template
  152. {
  153. "template": {
  154. "settings": {
  155. "index": {
  156. "number_of_replicas": 0,
  157. "number_of_shards": 4,
  158. "mode": "time_series",
  159. "routing_path": [ "metricset" ],
  160. "time_series": {
  161. "end_time": "2023-09-01T14:00:00.000Z",
  162. "start_time": "2023-09-01T06:00:00.000Z"
  163. }
  164. }
  165. },
  166. "mappings": {
  167. "properties": {
  168. "@timestamp": { "type": "date" },
  169. "metricset": {
  170. "type": "keyword",
  171. "time_series_dimension": true
  172. },
  173. "k8s": {
  174. "properties": {
  175. "tx": { "type": "long" },
  176. "rx": { "type": "long" }
  177. }
  178. }
  179. }
  180. }
  181. }
  182. }
  183. POST /_index_template/2
  184. {
  185. "index_patterns": [
  186. "k9s*"
  187. ],
  188. "composed_of": [
  189. "destination_template"
  190. ],
  191. "data_stream": {}
  192. }
  193. ----
  194. // TEST[continued]
  195. [discrete]
  196. [[tsds-reindex-op]]
  197. ==== Reindex
  198. Invoke the reindex api, for instance:
  199. [source,console]
  200. ----
  201. POST /_reindex
  202. {
  203. "source": {
  204. "index": "k8s"
  205. },
  206. "dest": {
  207. "index": "k9s",
  208. "op_type": "create"
  209. }
  210. }
  211. ----
  212. // TEST[continued]
  213. [discrete]
  214. [[tsds-reindex-restore]]
  215. ==== Restore the destination index template
  216. Once the reindexing operation completes, restore the index template for the destination TSDS as follows:
  217. * Remove the overrides for `index.time_series.start_time` and `index.time_series.end_time`.
  218. * Restore the values of `index.number_of_shards`, `index.number_of_replicas` and `index.lifecycle.name` as
  219. applicable.
  220. Using the previous example, the destination template is modified as follows:
  221. [source,console]
  222. ----
  223. POST /_component_template/destination_template
  224. {
  225. "template": {
  226. "settings": {
  227. "index": {
  228. "number_of_replicas": 2,
  229. "number_of_shards": 2,
  230. "mode": "time_series",
  231. "routing_path": [ "metricset" ]
  232. }
  233. },
  234. "mappings": {
  235. "properties": {
  236. "@timestamp": { "type": "date" },
  237. "metricset": {
  238. "type": "keyword",
  239. "time_series_dimension": true
  240. },
  241. "k8s": {
  242. "properties": {
  243. "tx": { "type": "long" },
  244. "rx": { "type": "long" }
  245. }
  246. }
  247. }
  248. }
  249. }
  250. }
  251. ----
  252. // TEST[continued]
  253. Next, Invoke the `rollover` api on the destination data stream without any conditions set.
  254. [source,console]
  255. ----
  256. POST /k9s/_rollover/
  257. ----
  258. // TEST[continued]
  259. This creates a new backing index with the updated index settings. The destination data stream is now ready to accept new documents.
  260. Note that the initial backing index can still accept documents within the range of timestamps derived from the source data
  261. stream. If this is not desired, mark it as <<index-blocks-read-only, read-only>> explicitly.