123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292 |
- [[tsds-reindex]]
- === Reindex a time series data stream (TSDS)
- ++++
- <titleabbrev>Reindex a TSDS</titleabbrev>
- ++++
- [discrete]
- [[tsds-reindex-intro]]
- ==== Introduction
- With reindexing, you can copy documents from an old <<tsds,time-series data stream (TSDS)>> to a new one. Data streams support
- reindexing in general, with a few <<reindex-with-a-data-stream, restrictions>>. Still, time-series data streams
- introduce additional challenges due to tight control on the accepted timestamp range for each backing index they
- contain. Direct use of the reindex API would likely error out due to attempting to insert documents with timestamps that are
- outside the current acceptance window.
- To avoid these limitations, use the process that is outlined below:
- . Create an index template for the destination data stream that will contain the re-indexed data.
- . Update the template to
- .. Set `index.time_series.start_time` and `index.time_series.end_time` index settings to
- match the lowest and highest `@timestamp` values in the old data stream.
- .. Set the `index.number_of_shards` index setting to the sum of all primary shards of all backing
- indices of the old data stream.
- .. Set `index.number_of_replicas` to zero and unset the `index.lifecycle.name` index setting.
- . Run the reindex operation to completion.
- . Revert the overriden index settings in the destination index template.
- . Invoke the `rollover` api to create a new backing index that can receive new documents.
- NOTE: This process only applies to time-series data streams without <<downsampling, downsampling>> configuration. Data
- streams with downsampling can only be re-indexed by re-indexing their backing indexes individually and adding them to an
- empty destination data stream.
- In what follows, we elaborate on each step of the process with examples.
- [discrete]
- [[tsds-reindex-create-template]]
- ==== Create a TSDS template to accept old documents
- Consider a TSDS with the following template:
- [source,console]
- ----
- POST /_component_template/source_template
- {
- "template": {
- "settings": {
- "index": {
- "number_of_replicas": 2,
- "number_of_shards": 2,
- "mode": "time_series",
- "routing_path": [ "metricset" ]
- }
- },
- "mappings": {
- "properties": {
- "@timestamp": { "type": "date" },
- "metricset": {
- "type": "keyword",
- "time_series_dimension": true
- },
- "k8s": {
- "properties": {
- "tx": { "type": "long" },
- "rx": { "type": "long" }
- }
- }
- }
- }
- }
- }
- POST /_index_template/1
- {
- "index_patterns": [
- "k8s*"
- ],
- "composed_of": [
- "source_template"
- ],
- "data_stream": {}
- }
- ----
- // TEST[skip: not expected to match the sample below]
- A possible output of `/k8s/_settings` looks like:
- [source,console-result]
- ----
- {
- ".ds-k8s-2023.09.01-000002": {
- "settings": {
- "index": {
- "mode": "time_series",
- "routing": {
- "allocation": {
- "include": {
- "_tier_preference": "data_hot"
- }
- }
- },
- "hidden": "true",
- "number_of_shards": "2",
- "time_series": {
- "end_time": "2023-09-01T14:00:00.000Z",
- "start_time": "2023-09-01T10:00:00.000Z"
- },
- "provided_name": ".ds-k9s-2023.09.01-000002",
- "creation_date": "1694439857608",
- "number_of_replicas": "2",
- "routing_path": [
- "metricset"
- ],
- ...
- }
- }
- },
- ".ds-k8s-2023.09.01-000001": {
- "settings": {
- "index": {
- "mode": "time_series",
- "routing": {
- "allocation": {
- "include": {
- "_tier_preference": "data_hot"
- }
- }
- },
- "hidden": "true",
- "number_of_shards": "2",
- "time_series": {
- "end_time": "2023-09-01T10:00:00.000Z",
- "start_time": "2023-09-01T06:00:00.000Z"
- },
- "provided_name": ".ds-k9s-2023.09.01-000001",
- "creation_date": "1694439837126",
- "number_of_replicas": "2",
- "routing_path": [
- "metricset"
- ],
- ...
- }
- }
- }
- }
- ----
- // NOTCONSOLE
- To reindex this TSDS, do not to re-use its index template in the destination data stream, to avoid impacting its
- functionality. Instead, clone the template of the source TSDS and apply the following modifications:
- * Set `index.time_series.start_time` and `index.time_series.end_time` index settings explicitly. Their values should be
- based on the lowest and highest `@timestamp` values in the data stream to reindex. This way, the initial backing index can
- load all data that is contained in the source data stream.
- * Set `index.number_of_shards` index setting to the sum of all primary shards of all backing indices of the source data
- stream. This helps maintain the same level of search parallelism, as each shard is processed in a separate thread (or
- more).
- * Unset the `index.lifecycle.name` index setting, if any. This prevents ILM from modifying the destination data stream
- during reindexing.
- * (Optional) Set `index.number_of_replicas` to zero. This helps speed up the reindex operation. Since the data gets
- copied, there is limited risk of data loss due to lack of replicas.
- Using the example above as source TSDS, the template for the destination TSDS would be:
- [source,console]
- ----
- POST /_component_template/destination_template
- {
- "template": {
- "settings": {
- "index": {
- "number_of_replicas": 0,
- "number_of_shards": 4,
- "mode": "time_series",
- "routing_path": [ "metricset" ],
- "time_series": {
- "end_time": "2023-09-01T14:00:00.000Z",
- "start_time": "2023-09-01T06:00:00.000Z"
- }
- }
- },
- "mappings": {
- "properties": {
- "@timestamp": { "type": "date" },
- "metricset": {
- "type": "keyword",
- "time_series_dimension": true
- },
- "k8s": {
- "properties": {
- "tx": { "type": "long" },
- "rx": { "type": "long" }
- }
- }
- }
- }
- }
- }
- POST /_index_template/2
- {
- "index_patterns": [
- "k9s*"
- ],
- "composed_of": [
- "destination_template"
- ],
- "data_stream": {}
- }
- ----
- // TEST[continued]
- [discrete]
- [[tsds-reindex-op]]
- ==== Reindex
- Invoke the reindex api, for instance:
- [source,console]
- ----
- POST /_reindex
- {
- "source": {
- "index": "k8s"
- },
- "dest": {
- "index": "k9s",
- "op_type": "create"
- }
- }
- ----
- // TEST[continued]
- [discrete]
- [[tsds-reindex-restore]]
- ==== Restore the destination index template
- Once the reindexing operation completes, restore the index template for the destination TSDS as follows:
- * Remove the overrides for `index.time_series.start_time` and `index.time_series.end_time`.
- * Restore the values of `index.number_of_shards`, `index.number_of_replicas` and `index.lifecycle.name` as
- applicable.
- Using the previous example, the destination template is modified as follows:
- [source,console]
- ----
- POST /_component_template/destination_template
- {
- "template": {
- "settings": {
- "index": {
- "number_of_replicas": 2,
- "number_of_shards": 2,
- "mode": "time_series",
- "routing_path": [ "metricset" ]
- }
- },
- "mappings": {
- "properties": {
- "@timestamp": { "type": "date" },
- "metricset": {
- "type": "keyword",
- "time_series_dimension": true
- },
- "k8s": {
- "properties": {
- "tx": { "type": "long" },
- "rx": { "type": "long" }
- }
- }
- }
- }
- }
- }
- ----
- // TEST[continued]
- Next, Invoke the `rollover` api on the destination data stream without any conditions set.
- [source,console]
- ----
- POST /k9s/_rollover/
- ----
- // TEST[continued]
- This creates a new backing index with the updated index settings. The destination data stream is now ready to accept new documents.
- Note that the initial backing index can still accept documents within the range of timestamps derived from the source data
- stream. If this is not desired, mark it as <<index-blocks-read-only, read-only>> explicitly.
|