|
@@ -3,283 +3,105 @@
|
|
|
[[ilm-with-existing-indices]]
|
|
|
== Manage existing indices
|
|
|
|
|
|
-NOTE: If migrating from Curator, ensure you are running Curator version 5.7 or greater
|
|
|
-so that Curator will ignore ILM managed indices.
|
|
|
+If you've been using Curator or some other mechanism to manage periodic indices,
|
|
|
+you have a couple options when migrating to {ilm-init}:
|
|
|
|
|
|
-While it is recommended to use {ilm-init} to manage the index lifecycle from
|
|
|
-start to finish, it may be useful to use {ilm-init} with existing indices,
|
|
|
-for example, when migrating from daily indices to rollover-based indices.
|
|
|
-Such use cases are fully supported, but there are some configuration differences
|
|
|
-from when {ilm-init} can manage the complete index lifecycle.
|
|
|
+* Set up your index templates to use an {ilm-init} policy to manage your new indices.
|
|
|
+Once {ilm-init} is managing your current write index, you can apply an appropriate policy to your old indices.
|
|
|
|
|
|
-This section describes strategies to leverage {ilm-init} for existing periodic
|
|
|
-indices when migrating to fully {ilm-init}-manged indices, which can be done in
|
|
|
-a few different ways, each providing different tradeoffs. As an example, we'll
|
|
|
-walk through a use case of a very simple logging index with just a field for the
|
|
|
-log message and a timestamp.
|
|
|
+* Reindex into an {ilm-init}-managed index.
|
|
|
|
|
|
-First, we need to create a template for these indices:
|
|
|
+NOTE: Starting in Curator version 5.7, Curator ignores {ilm-init} managed indices.
|
|
|
|
|
|
-[source,console]
|
|
|
------------------------
|
|
|
-PUT _template/mylogs_template
|
|
|
-{
|
|
|
- "index_patterns": [
|
|
|
- "mylogs-*"
|
|
|
- ],
|
|
|
- "settings": {
|
|
|
- "number_of_shards": 1,
|
|
|
- "number_of_replicas": 1
|
|
|
- },
|
|
|
- "mappings": {
|
|
|
- "properties": {
|
|
|
- "message": {
|
|
|
- "type": "text"
|
|
|
- },
|
|
|
- "@timestamp": {
|
|
|
- "type": "date"
|
|
|
- }
|
|
|
- }
|
|
|
- }
|
|
|
-}
|
|
|
------------------------
|
|
|
-
|
|
|
-And we'll ingest a few documents to create a few daily indices:
|
|
|
-
|
|
|
-[source,console]
|
|
|
------------------------
|
|
|
-POST mylogs-pre-ilm-2019.06.24/_doc
|
|
|
-{
|
|
|
- "@timestamp": "2019-06-24T10:34:00",
|
|
|
- "message": "this is one log message"
|
|
|
-}
|
|
|
------------------------
|
|
|
-// TEST[continued]
|
|
|
-
|
|
|
-[source,console]
|
|
|
------------------------
|
|
|
-POST mylogs-pre-ilm-2019.06.25/_doc
|
|
|
-{
|
|
|
- "@timestamp": "2019-06-25T17:42:00",
|
|
|
- "message": "this is another log message"
|
|
|
-}
|
|
|
------------------------
|
|
|
-// TEST[continued]
|
|
|
-
|
|
|
-//////////////////////////
|
|
|
-
|
|
|
-[source,console]
|
|
|
---------------------------------------------------
|
|
|
-DELETE _template/mylogs_template
|
|
|
---------------------------------------------------
|
|
|
-// TEST[continued]
|
|
|
-
|
|
|
-//////////////////////////
|
|
|
-
|
|
|
-Now that we have these indices, we'll look at a few different ways of migrating
|
|
|
-these indices to {ilm-init}.
|
|
|
-
|
|
|
-[[ilm-with-existing-periodic-indices]]
|
|
|
-=== Managing existing periodic indices with {ilm-init}
|
|
|
-
|
|
|
-NOTE: The examples in this section assume daily indices as set up in
|
|
|
-<<ilm-with-existing-indices,the previous section>>.
|
|
|
-
|
|
|
-The simplest way to manage existing indices while transitioning to fully
|
|
|
-{ilm-init}-managed indices is to allow all new indices to be fully managed by
|
|
|
-{ilm-init} before attaching {ilm-init} policies to existing indices. To do this,
|
|
|
-all new documents should be directed to {ilm-init}-managed indices - if you are
|
|
|
-using Beats or Logstash data shippers, upgrading all of those shippers to
|
|
|
-version 7.0.0 or higher will take care of that part for you. If you are not
|
|
|
-using Beats or Logstash, you may need to set up {ilm-init} for new indices yourself as
|
|
|
-demonstrated in the <<getting-started-index-lifecycle-management,{ilm-init} tutorial>>.
|
|
|
-
|
|
|
-NOTE: If you are using Beats through Logstash, you may need to change your
|
|
|
-Logstash output configuration and invoke the Beats setup to use {ilm-init} for new
|
|
|
-data.
|
|
|
-
|
|
|
-Once all new documents are being written to fully {ilm-init}-managed indices, it
|
|
|
-is easy to add an {ilm-init} policy to existing indices. However, there are two
|
|
|
-things to keep in mind when doing this, and a trick that makes those two things
|
|
|
-much easier to handle.
|
|
|
-
|
|
|
-The two biggest things to keep in mind are:
|
|
|
-
|
|
|
-1. Existing periodic indices shouldn't use policies with rollover, because
|
|
|
-rollover is used to manage where new data goes. Since existing indices should no
|
|
|
-longer be receiving new documents, there is no point to using rollover for them.
|
|
|
-
|
|
|
-2. {ilm-init} policies attached to existing indices will compare the `min_age`
|
|
|
-for each phase to the original creation date of the index, and so might proceed
|
|
|
-through multiple phases immediately.
|
|
|
-
|
|
|
-The first one is the most important, because it makes it difficult to use the
|
|
|
-same policy for new and existing periodic indices. But that's easy to solve
|
|
|
-with one simple trick: Create a second policy for existing indices, in addition
|
|
|
-to the one for new indices. {ilm-init} policies are cheap to create, so don't be
|
|
|
-afraid to have more than one. Modifying a policy designed for new indices to be
|
|
|
-used on existing indices is generally very simple: just remove the `rollover`
|
|
|
-action.
|
|
|
-
|
|
|
-For example, if you created a policy for your new indices with each phase
|
|
|
-like so:
|
|
|
-
|
|
|
-[source,console]
|
|
|
------------------------
|
|
|
-PUT _ilm/policy/mylogs_policy
|
|
|
-{
|
|
|
- "policy": {
|
|
|
- "phases": {
|
|
|
- "hot": {
|
|
|
- "actions": {
|
|
|
- "rollover": {
|
|
|
- "max_size": "25GB"
|
|
|
- }
|
|
|
- }
|
|
|
- },
|
|
|
- "warm": {
|
|
|
- "min_age": "1d",
|
|
|
- "actions": {
|
|
|
- "forcemerge": {
|
|
|
- "max_num_segments": 1
|
|
|
- }
|
|
|
- }
|
|
|
- },
|
|
|
- "cold": {
|
|
|
- "min_age": "7d",
|
|
|
- "actions": {
|
|
|
- "freeze": {}
|
|
|
- }
|
|
|
- },
|
|
|
- "delete": {
|
|
|
- "min_age": "30d",
|
|
|
- "actions": {
|
|
|
- "delete": {}
|
|
|
- }
|
|
|
- }
|
|
|
- }
|
|
|
- }
|
|
|
-}
|
|
|
------------------------
|
|
|
-// TEST[continued]
|
|
|
+[discrete]
|
|
|
+[[ilm-existing-indices-apply]]
|
|
|
+=== Apply policies to existing time series indices
|
|
|
|
|
|
-You can create a policy for pre-existing indices by removing the `rollover`
|
|
|
-action, and in this case, the `hot` phase is now empty so we can remove that
|
|
|
-too:
|
|
|
+The simplest way to transition to managing your periodic indices with {ilm-init} is
|
|
|
+to <<apply-policy-template, configure an index template>> to apply a lifecycle policy to new indices.
|
|
|
+Once the index you are writing to is being managed by {ilm-init},
|
|
|
+you can <<apply-policy-multiple, manually apply a policy>> to your older indices.
|
|
|
|
|
|
-[source,console]
|
|
|
------------------------
|
|
|
-PUT _ilm/policy/mylogs_policy_existing
|
|
|
-{
|
|
|
- "policy": {
|
|
|
- "phases": {
|
|
|
- "warm": {
|
|
|
- "min_age": "1d",
|
|
|
- "actions": {
|
|
|
- "forcemerge": {
|
|
|
- "max_num_segments": 1
|
|
|
- }
|
|
|
- }
|
|
|
- },
|
|
|
- "cold": {
|
|
|
- "min_age": "7d",
|
|
|
- "actions": {
|
|
|
- "freeze": {}
|
|
|
- }
|
|
|
- },
|
|
|
- "delete": {
|
|
|
- "min_age": "30d",
|
|
|
- "actions": {
|
|
|
- "delete": {}
|
|
|
- }
|
|
|
- }
|
|
|
- }
|
|
|
- }
|
|
|
-}
|
|
|
------------------------
|
|
|
-// TEST[continued]
|
|
|
+Define a separate policy for your older indices that omits the rollover action.
|
|
|
+Rollover is used to manage where new data goes, so isn't applicable.
|
|
|
|
|
|
-Creating a separate policy for existing indices will also allow using different
|
|
|
-`min_age` values. You may want to use higher values to prevent many indices from
|
|
|
-running through the policy at once, which may be important if your policy
|
|
|
-includes potentially resource-intensive operations like force merge.
|
|
|
+Keep in mind that policies applied to existing indices compare the `min_age` for each phase to
|
|
|
+the original creation date of the index, and might proceed through multiple phases immediately.
|
|
|
+If your policy performs resource-intensive operations like force merge,
|
|
|
+you don't want to have a lot of indices performing those operations all at once
|
|
|
+when you switch over to {ilm-init}.
|
|
|
|
|
|
-You can configure the lifecycle for many indices at once by using wildcards in
|
|
|
-the index name when calling the <<indices-update-settings,Update Settings API>>
|
|
|
-to set the policy name, but be careful that you don't include any indices that
|
|
|
-you don't want to change the policy for:
|
|
|
+You can specify different `min_age` values in the policy you use for existing indices,
|
|
|
+or set <<index-lifecycle-origination-date, `index.lifecycle.origination_date`>>
|
|
|
+to control how the index age is calculated.
|
|
|
|
|
|
-[source,console]
|
|
|
------------------------
|
|
|
-PUT mylogs-pre-ilm*/_settings <1>
|
|
|
-{
|
|
|
- "index": {
|
|
|
- "lifecycle": {
|
|
|
- "name": "mylogs_policy_existing"
|
|
|
- }
|
|
|
- }
|
|
|
-}
|
|
|
------------------------
|
|
|
-// TEST[continued]
|
|
|
+Once all pre-{ilm-init} indices have been aged out and removed,
|
|
|
+you can delete the policy you used to manage them.
|
|
|
|
|
|
-<1> This pattern will match all indices with names that start with
|
|
|
-`mylogs-pre-ilm`
|
|
|
+NOTE: If you are using {beats} or {ls}, enabling {ilm-init} in version 7.0 and onward
|
|
|
+sets up {ilm-init} to manage new indices automatically.
|
|
|
+If you are using {beats} through {ls},
|
|
|
+you might need to change your {ls} output configuration and invoke the {beats} setup
|
|
|
+to use {ilm-init} for new data.
|
|
|
|
|
|
-Once all pre-{ilm-init} indices have aged out and been deleted, the policy for
|
|
|
-older periodic indices can be deleted.
|
|
|
+[discrete]
|
|
|
+[[ilm-existing-indices-reindex]]
|
|
|
+=== Reindex into a managed index
|
|
|
|
|
|
-[[ilm-reindexing-into-rollover]]
|
|
|
-=== Reindexing via {ilm-init}
|
|
|
+An alternative to <<ilm-with-existing-periodic-indices,applying policies to existing indices>> is to
|
|
|
+reindex your data into an {ilm-init}-managed index.
|
|
|
+You might want to do this if creating periodic indices with very small amounts of data
|
|
|
+has led to excessive shard counts, or if continually indexing into the same index has led to large shards
|
|
|
+and performance issues.
|
|
|
|
|
|
-NOTE: The examples in this section assume daily indices as set up in
|
|
|
-<<ilm-with-existing-indices,the previous section>>.
|
|
|
+First, you need to set up the new {ilm-init}-managed index:
|
|
|
|
|
|
-In some cases, it may be useful to reindex data into {ilm-init}-managed indices.
|
|
|
-This is more complex than simply attaching policies to existing indices as
|
|
|
-described in <<ilm-with-existing-periodic-indices,the previous section>>, and
|
|
|
-requires pausing indexing during the reindexing process. However, this technique
|
|
|
-may be useful in cases where periodic indices were created with very small
|
|
|
-amounts of data leading to excessive shard counts, or for indices which grow
|
|
|
-steadily over time, but have not been broken up into time-series indices leading
|
|
|
-to shards which are much too large, situations that cause significant
|
|
|
-performance problems.
|
|
|
+. Update your index template to include the necessary {ilm-init} settings.
|
|
|
+. Bootstrap an initial index as the write index.
|
|
|
+. Stop writing to the old indices and index new documents using the alias that points to bootstrapped index.
|
|
|
|
|
|
-Before getting started with reindexing data, the new index structure should be
|
|
|
-set up. For this section, we'll be using the same setup described in
|
|
|
-<<ilm-with-existing-indices,{ilm-imit} with existing indices>>.
|
|
|
+To reindex into the managed index:
|
|
|
|
|
|
-First, we'll set up a policy with rollover, and can include any additional
|
|
|
-phases required. For simplicity, we'll just use rollover:
|
|
|
+. Pause indexing new documents if you do not want to mix new and old data in the {ilm-init}-managed index.
|
|
|
+Mixing old and new data in one index is safe,
|
|
|
+but a combined index needs to be retained until you are ready to delete the new data.
|
|
|
|
|
|
+. Reduce the {ilm-init} poll interval to ensure that the index doesn't
|
|
|
+grow too large while waiting for the rollover check.
|
|
|
+By default, {ilm-init} checks rollover conditions every 10 minutes.
|
|
|
++
|
|
|
+--
|
|
|
[source,console]
|
|
|
-----------------------
|
|
|
-PUT _ilm/policy/mylogs_condensed_policy
|
|
|
+PUT _cluster/settings
|
|
|
{
|
|
|
- "policy": {
|
|
|
- "phases": {
|
|
|
- "hot": {
|
|
|
- "actions": {
|
|
|
- "rollover": {
|
|
|
- "max_age": "7d",
|
|
|
- "max_size": "50G"
|
|
|
- }
|
|
|
- }
|
|
|
- }
|
|
|
- }
|
|
|
+ "transient": {
|
|
|
+ "indices.lifecycle.poll_interval": "1m" <1>
|
|
|
}
|
|
|
}
|
|
|
-----------------------
|
|
|
-// TEST[continued]
|
|
|
-
|
|
|
-And now we'll update the index template for our indices to include the relevant
|
|
|
-{ilm-init} settings:
|
|
|
+// TEST[skip:don't want to overwrite this setting for other tests]
|
|
|
+<1> Check once a minute to see if {ilm-init} actions such as rollover need to be performed.
|
|
|
+--
|
|
|
+
|
|
|
+. Reindex your data using the <<docs-reindex,reindex API>>.
|
|
|
+If you want to partition the data in the order in which it was originally indexed,
|
|
|
+you can run separate reindex requests.
|
|
|
++
|
|
|
+--
|
|
|
+IMPORTANT: Documents retain their original IDs. If you don't use automatically generated document IDs,
|
|
|
+and are reindexing from multiple source indices, you might need to do additional processing to
|
|
|
+ensure that document IDs don't conflict. One way to do this is to use a
|
|
|
+<<reindex-scripts,script>> in the reindex call to append the original index name
|
|
|
+to the document ID.
|
|
|
|
|
|
+//////////////////////////
|
|
|
[source,console]
|
|
|
-----------------------
|
|
|
PUT _template/mylogs_template
|
|
|
{
|
|
|
"index_patterns": [
|
|
|
- "ilm-mylogs-*" <1>
|
|
|
+ "mylogs-*"
|
|
|
],
|
|
|
"settings": {
|
|
|
"number_of_shards": 1,
|
|
@@ -303,93 +125,41 @@ PUT _template/mylogs_template
|
|
|
}
|
|
|
}
|
|
|
-----------------------
|
|
|
-// TEST[continued]
|
|
|
-
|
|
|
-<1> The new index pattern has a prefix compared to the old one, this will
|
|
|
- make it easier to reindex later
|
|
|
-<2> The name of the policy we defined above
|
|
|
-<3> The name of the alias we'll use to write to and query
|
|
|
-
|
|
|
-And create the first index with the alias specified in the `rollover_alias`
|
|
|
-setting in the index template:
|
|
|
|
|
|
[source,console]
|
|
|
-----------------------
|
|
|
-PUT ilm-mylogs-000001
|
|
|
+POST mylogs-pre-ilm-2019.06.24/_doc
|
|
|
{
|
|
|
- "aliases": {
|
|
|
- "mylogs": {
|
|
|
- "is_write_index": true
|
|
|
- }
|
|
|
- }
|
|
|
+ "@timestamp": "2019-06-24T10:34:00",
|
|
|
+ "message": "this is one log message"
|
|
|
}
|
|
|
-----------------------
|
|
|
// TEST[continued]
|
|
|
|
|
|
-//////////////////////////
|
|
|
-
|
|
|
-[source,console]
|
|
|
---------------------------------------------------
|
|
|
-DELETE /_template/mylogs_template
|
|
|
---------------------------------------------------
|
|
|
-// TEST[continued]
|
|
|
-
|
|
|
-//////////////////////////
|
|
|
-
|
|
|
-All new documents should be indexed via the `mylogs` alias at this point. Adding
|
|
|
-new data to the old indices during the reindexing process can cause data to be
|
|
|
-added to the old indices, but not be reindexed into the new indices.
|
|
|
-
|
|
|
-NOTE: If you do not want to mix new data and old data in the new {ilm-init}-managed
|
|
|
-indices, indexing of new data should be paused entirely while the reindex
|
|
|
-completes. Mixing old and new data within one index is safe, but keep in mind
|
|
|
-that the indices with mixed data should be retained in their entirety until you
|
|
|
-are ready to delete both the old and new data.
|
|
|
-
|
|
|
-By default, {ilm-init} only checks rollover conditions every 10 minutes. Under
|
|
|
-normal indexing load, this usually works well, but during reindexing, indices
|
|
|
-can grow very, very quickly. We'll need to set the poll interval to something
|
|
|
-shorter to ensure that the new indices don't grow too large while waiting for
|
|
|
-the rollover check:
|
|
|
-
|
|
|
[source,console]
|
|
|
-----------------------
|
|
|
-PUT _cluster/settings
|
|
|
+POST mylogs-pre-ilm-2019.06.25/_doc
|
|
|
{
|
|
|
- "transient": {
|
|
|
- "indices.lifecycle.poll_interval": "1m" <1>
|
|
|
- }
|
|
|
+ "@timestamp": "2019-06-25T17:42:00",
|
|
|
+ "message": "this is another log message"
|
|
|
}
|
|
|
-----------------------
|
|
|
-// TEST[skip:don't want to overwrite this setting for other tests]
|
|
|
-
|
|
|
-<1> This tells {ilm-init} to check for rollover conditions every minute
|
|
|
-
|
|
|
-We're now ready to reindex our data using the <<docs-reindex,reindex API>>. If
|
|
|
-you have a timestamp or date field in your documents, as in this example, it may
|
|
|
-be useful to specify that the documents should be sorted by that field - this
|
|
|
-will mean that all documents in `ilm-mylogs-000001` come before all documents in
|
|
|
-`ilm-mylogs-000002`, and so on. However, if this is not a requirement, omitting
|
|
|
-the sort will allow the data to be reindexed more quickly.
|
|
|
+// TEST[continued]
|
|
|
|
|
|
-NOTE: Sorting in reindex is deprecated, see
|
|
|
-<<docs-reindex-api-request-body,reindex request body>>. Instead use timestamp
|
|
|
-ranges to partition data in separate reindex runs.
|
|
|
+[source,console]
|
|
|
+--------------------------------------------------
|
|
|
+DELETE _template/mylogs_template
|
|
|
+--------------------------------------------------
|
|
|
+// TEST[continued]
|
|
|
|
|
|
-IMPORTANT: If your data uses document IDs generated by means other than
|
|
|
-Elasticsearch's automatic ID generation, you may need to do additional
|
|
|
-processing to ensure that the document IDs don't conflict during the reindex, as
|
|
|
-documents will retain their original IDs. One way to do this is to use a
|
|
|
-<<reindex-scripts,script>> in the reindex call to append the original index name
|
|
|
-to the document ID.
|
|
|
+//////////////////////////
|
|
|
|
|
|
[source,console]
|
|
|
-----------------------
|
|
|
POST _reindex
|
|
|
{
|
|
|
"source": {
|
|
|
- "index": "mylogs-*", <1>
|
|
|
- "sort": { "@timestamp": "desc" }
|
|
|
+ "index": "mylogs-*" <1>
|
|
|
},
|
|
|
"dest": {
|
|
|
"index": "mylogs", <2>
|
|
@@ -399,20 +169,17 @@ POST _reindex
|
|
|
-----------------------
|
|
|
// TEST[continued]
|
|
|
|
|
|
-<1> This index pattern matches our existing indices. Using the prefix for
|
|
|
+<1> Matches your existing indices. Using the prefix for
|
|
|
the new indices makes using this index pattern much easier.
|
|
|
-<2> The alias set up above
|
|
|
-<3> This option will cause the reindex to abort if it encounters multiple
|
|
|
- documents with the same ID. This is optional, but recommended to prevent
|
|
|
- accidentally overwriting documents if two documents from different indices
|
|
|
- have the same ID.
|
|
|
-
|
|
|
-Once this completes, indexing new data can be resumed, as long as all new
|
|
|
-documents are indexed into the alias used above. All data, existing and new, can
|
|
|
-be queried using that alias as well. We should also be sure to set the
|
|
|
-{ilm-init} poll interval back to its default value, because keeping it set too
|
|
|
-low can cause unnecessary load on the current master node:
|
|
|
-
|
|
|
+<2> The alias that points to your bootstrapped index.
|
|
|
+<3> Halts reindexing if multiple documents have the same ID.
|
|
|
+ This is recommended to prevent accidentally overwriting documents
|
|
|
+ if documents in different source indices have the same ID.
|
|
|
+--
|
|
|
+
|
|
|
+. When reindexing is complete, set the {ilm-init} poll interval back to its default value to
|
|
|
+prevent unnecessary load on the master node:
|
|
|
++
|
|
|
[source,console]
|
|
|
-----------------------
|
|
|
PUT _cluster/settings
|
|
@@ -425,7 +192,9 @@ PUT _cluster/settings
|
|
|
-----------------------
|
|
|
// TEST[skip:don't want to overwrite this setting for other tests]
|
|
|
|
|
|
-All of the reindexed data should now be accessible via the alias set up above,
|
|
|
-in this case `mylogs`. Once you have verified that all the data has been
|
|
|
-reindexed and is available in the new indices, the existing indices can be
|
|
|
-safely removed.
|
|
|
+. Resume indexing new data using the same alias.
|
|
|
++
|
|
|
+Querying using this alias will now search your new data and all of the reindexed data.
|
|
|
+
|
|
|
+. Once you have verified that all of the reindexed data is available in the new managed indices,
|
|
|
+you can safely remove the old indices.
|