123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411 |
- [role="xpack"]
- [testenv="basic"]
- [[ilm-with-existing-indices]]
- == Using {ilm-init} with existing indices
- While it is recommended to use {ilm-init} to manage the index lifecycle from
- start to finish, it may be useful to use {ilm-init} with existing indices,
- particularly when transitioning from an alternative method of managing the index
- lifecycle such as Curator, or when migrating from daily indices to
- rollover-based indices. Such use cases are fully supported, but there are some
- configuration differences from when {ilm-init} can manage the complete index
- lifecycle.
- This section describes strategies to leverage {ilm-init} for existing periodic
- indices when migrating to fully {ilm-init}-manged indices, which can be done in
- a few different ways, each providing different tradeoffs. As an example, we'll
- walk through a use case of a very simple logging index with just a field for the
- log message and a timestamp.
- First, we need to create a template for these indices:
- [source,console]
- -----------------------
- PUT _template/mylogs_template
- {
- "index_patterns": [
- "mylogs-*"
- ],
- "settings": {
- "number_of_shards": 1,
- "number_of_replicas": 1
- },
- "mappings": {
- "properties": {
- "message": {
- "type": "text"
- },
- "@timestamp": {
- "type": "date"
- }
- }
- }
- }
- -----------------------
- And we'll ingest a few documents to create a few daily indices:
- [source,console]
- -----------------------
- POST mylogs-pre-ilm-2019.06.24/_doc
- {
- "@timestamp": "2019-06-24T10:34:00",
- "message": "this is one log message"
- }
- -----------------------
- // TEST[continued]
- [source,console]
- -----------------------
- POST mylogs-pre-ilm-2019.06.25/_doc
- {
- "@timestamp": "2019-06-25T17:42:00",
- "message": "this is another log message"
- }
- -----------------------
- // TEST[continued]
- Now that we have these indices, we'll look at a few different ways of migrating
- these indices to ILM.
- [[ilm-with-existing-periodic-indices]]
- === Managing existing periodic indices with {ilm-init}
- NOTE: The examples in this section assume daily indices as set up in
- <<ilm-with-existing-indices,the previous section>>.
- The simplest way to manage existing indices while transitioning to fully
- {ilm-init}-managed indices is to allow all new indices to be fully managed by
- {ilm-init} before attaching {ilm-init} policies to existing indices. To do this,
- all new documents should be directed to {ilm-init}-managed indices - if you are
- using Beats or Logstash data shippers, upgrading all of those shippers to
- version 7.0.0 or higher will take care of that part for you. If you are not
- using Beats or Logstash, you may need to set up ILM for new indices yourself as
- demonstrated in the <<getting-started-index-lifecycle-management,getting started
- guide>>.
- NOTE: If you are using Beats through Logstash, you may need to change your
- Logstash output configuration and invoke the Beats setup to use ILM for new
- data.
- Once all new documents are being written to fully {ilm-init}-managed indices, it
- is easy to add an {ilm-init} policy to existing indices. However, there are two
- things to keep in mind when doing this, and a trick that makes those two things
- much easier to handle.
- The two biggest things to keep in mind are:
- 1. Existing periodic indices shouldn't use policies with rollover, because
- rollover is used to manage where new data goes. Since existing indices should no
- longer be receiving new documents, there is no point to using rollover for them.
- 2. {ilm-init} policies attached to existing indices will compare the `min_age`
- for each phase to the original creation date of the index, and so might proceed
- through multiple phases immediately.
- The first one is the most important, because it makes it difficult to use the
- same policy for new and existing periodic indices. But that's easy to solve
- with one simple trick: Create a second policy for existing indices, in addition
- to the one for new indices. {ilm-init} policies are cheap to create, so don't be
- afraid to have more than one. Modifying a policy designed for new indices to be
- used on existing indices is generally very simple: just remove the `rollover`
- action.
- For example, if you created a policy for your new indices with each phase
- like so:
- [source,console]
- -----------------------
- PUT _ilm/policy/mylogs_policy
- {
- "policy": {
- "phases": {
- "hot": {
- "actions": {
- "rollover": {
- "max_size": "25GB"
- }
- }
- },
- "warm": {
- "min_age": "1d",
- "actions": {
- "forcemerge": {
- "max_num_segments": 1
- }
- }
- },
- "cold": {
- "min_age": "7d",
- "actions": {
- "freeze": {}
- }
- },
- "delete": {
- "min_age": "30d",
- "actions": {
- "delete": {}
- }
- }
- }
- }
- }
- -----------------------
- // TEST[continued]
- You can create a policy for pre-existing indices by removing the `rollover`
- action, and in this case, the `hot` phase is now empty so we can remove that
- too:
- [source,console]
- -----------------------
- PUT _ilm/policy/mylogs_policy_existing
- {
- "policy": {
- "phases": {
- "warm": {
- "min_age": "1d",
- "actions": {
- "forcemerge": {
- "max_num_segments": 1
- }
- }
- },
- "cold": {
- "min_age": "7d",
- "actions": {
- "freeze": {}
- }
- },
- "delete": {
- "min_age": "30d",
- "actions": {
- "delete": {}
- }
- }
- }
- }
- }
- -----------------------
- // TEST[continued]
- Creating a separate policy for existing indices will also allow using different
- `min_age` values. You may want to use higher values to prevent many indices from
- running through the policy at once, which may be important if your policy
- includes potentially resource-intensive operations like force merge.
- You can configure the lifecycle for many indices at once by using wildcards in
- the index name when calling the <<indices-update-settings,Update Settings API>>
- to set the policy name, but be careful that you don't include any indices that
- you don't want to change the policy for:
- [source,console]
- -----------------------
- PUT mylogs-pre-ilm*/_settings <1>
- {
- "index": {
- "lifecycle": {
- "name": "mylogs_policy_existing"
- }
- }
- }
- -----------------------
- // TEST[continued]
- <1> This pattern will match all indices with names that start with
- `mylogs-pre-ilm`
- Once all pre-{ilm-init} indices have aged out and been deleted, the policy for
- older periodic indices can be deleted.
- [[ilm-reindexing-into-rollover]]
- === Reindexing via {ilm-init}
- NOTE: The examples in this section assume daily indices as set up in
- <<ilm-with-existing-indices,the previous section>>.
- In some cases, it may be useful to reindex data into {ilm-init}-managed indices.
- This is more complex than simply attaching policies to existing indices as
- described in <<ilm-with-existing-periodic-indices,the previous section>>, and
- requires pausing indexing during the reindexing process. However, this technique
- may be useful in cases where periodic indices were created with very small
- amounts of data leading to excessive shard counts, or for indices which grow
- steadily over time, but have not been broken up into time-series indices leading
- to shards which are much too large, situations that cause significant
- performance problems.
- Before getting started with reindexing data, the new index structure should be
- set up. For this section, we'll be using the same setup described in
- <<ilm-with-existing-indices,{ilm-imit} with existing indices>>.
- First, we'll set up a policy with rollover, and can include any additional
- phases required. For simplicity, we'll just use rollover:
- [source,console]
- -----------------------
- PUT _ilm/policy/sample_policy
- {
- "policy": {
- "phases": {
- "hot": {
- "actions": {
- "rollover": {
- "max_age": "7d",
- "max_size": "50G"
- }
- }
- }
- }
- }
- }
- -----------------------
- // TEST[continued]
- And now we'll update the index template for our indices to include the relevant
- {ilm-init} settings:
- [source,console]
- -----------------------
- PUT _template/mylogs_template
- {
- "index_patterns": [
- "ilm-mylogs-*" <1>
- ],
- "settings": {
- "number_of_shards": 1,
- "number_of_replicas": 1,
- "index": {
- "lifecycle": {
- "name": "mylogs_condensed_policy", <2>
- "rollover_alias": "mylogs" <3>
- }
- }
- },
- "mappings": {
- "properties": {
- "message": {
- "type": "text"
- },
- "@timestamp": {
- "type": "date"
- }
- }
- }
- }
- -----------------------
- // TEST[continued]
- <1> The new index pattern has a prefix compared to the old one, this will
- make it easier to reindex later
- <2> The name of the policy we defined above
- <3> The name of the alias we'll use to write to and query
- And create the first index with the alias specified in the `rollover_alias`
- setting in the index template:
- [source,console]
- -----------------------
- PUT ilm-mylogs-000001
- {
- "aliases": {
- "mylogs": {
- "is_write_index": true
- }
- }
- }
- -----------------------
- // TEST[continued]
- All new documents should be indexed via the `mylogs` alias at this point. Adding
- new data to the old indices during the reindexing process can cause data to be
- added to the old indices, but not be reindexed into the new indices.
- NOTE: If you do not want to mix new data and old data in the new ILM-managed
- indices, indexing of new data should be paused entirely while the reindex
- completes. Mixing old and new data within one index is safe, but keep in mind
- that the indices with mixed data should be retained in their entirety until you
- are ready to delete both the old and new data.
- By default, {ilm-init} only checks rollover conditions every 10 minutes. Under
- normal indexing load, this usually works well, but during reindexing, indices
- can grow very, very quickly. We'll need to set the poll interval to something
- shorter to ensure that the new indices don't grow too large while waiting for
- the rollover check:
- [source,console]
- -----------------------
- PUT _cluster/settings
- {
- "transient": {
- "indices.lifecycle.poll_interval": "1m" <1>
- }
- }
- -----------------------
- // TEST[skip:don't want to overwrite this setting for other tests]
- <1> This tells ILM to check for rollover conditions every minute
- We're now ready to reindex our data using the <<docs-reindex,reindex API>>. If
- you have a timestamp or date field in your documents, as in this example, it may
- be useful to specify that the documents should be sorted by that field - this
- will mean that all documents in `ilm-mylogs-000001` come before all documents in
- `ilm-mylogs-000002`, and so on. However, if this is not a requirement, omitting
- the sort will allow the data to be reindexed more quickly.
- NOTE: Sorting in reindex is deprecated, see
- <<docs-reindex-api-request-body,reindex request body>>. Instead use timestamp
- ranges to partition data in separate reindex runs.
- IMPORTANT: If your data uses document IDs generated by means other than
- Elasticsearch's automatic ID generation, you may need to do additional
- processing to ensure that the document IDs don't conflict during the reindex, as
- documents will retain their original IDs. One way to do this is to use a
- <<reindex-scripts,script>> in the reindex call to append the original index name
- to the document ID.
- [source,console]
- -----------------------
- POST _reindex
- {
- "source": {
- "index": "mylogs-*", <1>
- "sort": { "@timestamp": "desc" }
- },
- "dest": {
- "index": "mylogs", <2>
- "op_type": "create" <3>
- }
- }
- -----------------------
- // TEST[continued]
- <1> This index pattern matches our existing indices. Using the prefix for
- the new indices makes using this index pattern much easier.
- <2> The alias set up above
- <3> This option will cause the reindex to abort if it encounters multiple
- documents with the same ID. This is optional, but recommended to prevent
- accidentally overwriting documents if two documents from different indices
- have the same ID.
- Once this completes, indexing new data can be resumed, as long as all new
- documents are indexed into the alias used above. All data, existing and new, can
- be queried using that alias as well. We should also be sure to set the
- {ilm-init} poll interval back to its default value, because keeping it set too
- low can cause unnecessary load on the current master node:
- [source,console]
- -----------------------
- PUT _cluster/settings
- {
- "transient": {
- "indices.lifecycle.poll_interval": null
- }
- }
- -----------------------
- // TEST[skip:don't want to overwrite this setting for other tests]
- All of the reindexed data should now be accessible via the alias set up above,
- in this case `mylogs`. Once you have verified that all the data has been
- reindexed and is available in the new indices, the existing indices can be
- safely removed.
|