lqb
/
elasticsearch
镜像来自 https://gitee.com/mirrors/elasticsearch.git


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251
							[role="xpack"]
[[index-lifecycle-error-handling]]
== Troubleshooting {ilm} errors

When <<index-lifecycle-management,{ilm-init}>> executes a lifecycle policy, it's possible for errors to occur
while performing the necessary index operations for a step. 
When this happens, {ilm-init} moves the index to an `ERROR` step. 
If {ilm-init} cannot resolve the error automatically, execution is halted  
until you resolve the underlying issues with the policy, index, or cluster.

See https://www.youtube.com/watch?v=VCIqkji3IwY[this video] 
for a walkthrough of troubleshooting current {ilm-init} health issues, and https://www.youtube.com/watch?v=onrnnwjYWSQ[this video]
for a walkthrough of troubleshooting historical {ilm-init} issues.

For example, you might have a `shrink-index` policy that shrinks an index to four shards once it
is at least five days old: 

[source,console]
--------------------------------------------------
PUT _ilm/policy/shrink-index
{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "5d",
        "actions": {
          "shrink": {
            "number_of_shards": 4
          }
        }
      }
    }
  }
}
--------------------------------------------------
// TEST

There is nothing that prevents you from applying the `shrink-index` policy to a new
index that has only two shards:

[source,console]
--------------------------------------------------
PUT /my-index-000001
{
  "settings": {
    "index.number_of_shards": 2,
    "index.lifecycle.name": "shrink-index"
  }
}
--------------------------------------------------
// TEST[continued]

After five days, {ilm-init} attempts to shrink `my-index-000001` from two shards to four shards.
Because the shrink action cannot _increase_ the number of shards, this operation fails 
and {ilm-init} moves `my-index-000001` to the `ERROR` step. 

You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to get information about
what went wrong: 

[source,console]
--------------------------------------------------
GET /my-index-000001/_ilm/explain
--------------------------------------------------
// TEST[continued]

Which returns the following information:

[source,console-result]
--------------------------------------------------
{
  "indices" : {
    "my-index-000001" : {
      "index" : "my-index-000001",
      "managed" : true,                         
      "index_creation_date_millis" : 1541717265865,
      "time_since_index_creation": "5.1d",
      "policy" : "shrink-index",                <1>
      "lifecycle_date_millis" : 1541717265865,
      "age": "5.1d",                            <2>
      "phase" : "warm",                         <3>
      "phase_time_millis" : 1541717272601,
      "action" : "shrink",                      <4>
      "action_time_millis" : 1541717272601,
      "step" : "ERROR",                         <5>
      "step_time_millis" : 1541717272688,
      "failed_step" : "shrink",                 <6>
      "step_info" : {
        "type" : "illegal_argument_exception",  <7>
        "reason" : "the number of target shards [4] must be less that the number of source shards [2]"
      },
      "phase_execution" : {
        "policy" : "shrink-index",
        "phase_definition" : {                  <8>
          "min_age" : "5d",
          "actions" : {
            "shrink" : {
              "number_of_shards" : 4
            }
          }
        },
        "version" : 1,
        "modified_date_in_millis" : 1541717264230
      }
    }
  }
}
--------------------------------------------------
// TESTRESPONSE[skip:no way to know if we will get this response immediately]

<1> The policy being used to manage the index: `shrink-index`
<2> The index age: 5.1 days
<3> The phase the index is currently in: `warm`
<4> The current action: `shrink`
<5> The step the index is currently in: `ERROR`
<6> The step that failed to execute: `shrink`
<7> The type of error and a description of that error.
<8> The definition of the current phase from the `shrink-index` policy

To resolve this, you could update the policy to shrink the index to a single shard after 5 days:

[source,console]
--------------------------------------------------
PUT _ilm/policy/shrink-index
{
  "policy": {
    "phases": {
      "warm": {
        "min_age": "5d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          }
        }
      }
    }
  }
}
--------------------------------------------------
// TEST[continued]

[discrete]
=== Retrying failed lifecycle policy steps

Once you fix the problem that put an index in the `ERROR` step, 
you might need to explicitly tell {ilm-init} to retry the step:

[source,console]
--------------------------------------------------
POST /my-index-000001/_ilm/retry
--------------------------------------------------
// TEST[skip:we can't be sure the index is ready to be retried at this point]

{ilm-init} subsequently attempts to re-run the step that failed. 
You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to monitor the progress.


[discrete]
=== Common {ilm-init} setting issues

[discrete]
[[min-age-calculation]]
==== How `min_age` is calculated

When setting up an <<set-up-lifecycle-policy,{ilm-init} policy>> or <<getting-started-index-lifecycle-management,automating rollover with {ilm-init}>>, be aware that `min_age` can be relative to either the rollover time or the index creation time.

If you use <<ilm-rollover,{ilm-init} rollover>>, `min_age` is calculated relative to the time the index was rolled over. This is because the <<indices-rollover-index,rollover API>> generates a new index and updates the `age` of the previous index to reflect the rollover time. If the index hasn't been rolled over, then the `age` is the same as the `creation_date` for the index.

You can override how `min_age` is calculated using the `index.lifecycle.origination_date` and `index.lifecycle.parse_origination_date` <<ilm-settings,{ilm-init} settings>>.


[discrete]
=== Common {ilm-init} errors

Here's how to resolve the most common errors reported in the `ERROR` step.

TIP: Problems with rollover aliases are a common cause of errors.
Consider using <<data-streams, data streams>> instead of managing rollover with aliases.

[discrete]
==== Rollover alias [x] can point to multiple indices, found duplicated alias [x] in index template [z]

The target rollover alias is specified in an index template's `index.lifecycle.rollover_alias` setting.
You need to explicitly configure this alias _one time_ when you
<<ilm-gs-alias-bootstrap, bootstrap the initial index>>.
The rollover action then manages setting and updating the alias to
<<rollover-index-api-desc, roll over>> to each subsequent index.

Do not explicitly configure this same alias in the aliases section of an index template.

See this https://www.youtube.com/watch?v=Ww5POq4zZtY[resolving `duplicate alias` video] for an example troubleshooting walkthrough.

[discrete]
==== index.lifecycle.rollover_alias [x] does not point to index [y]

Either the index is using the wrong alias or the alias does not exist.

Check the `index.lifecycle.rollover_alias` <<indices-get-settings, index setting>>.
To see what aliases are configured, use <<cat-alias, _cat/aliases>>.

See this https://www.youtube.com/watch?v=NKSe67x7aw8[resolving `not point to index` video] for an example troubleshooting walkthrough.

[discrete]
==== Setting [index.lifecycle.rollover_alias] for index [y] is empty or not defined

The `index.lifecycle.rollover_alias` setting must be configured for the rollover action to work.

Update the index settings to set `index.lifecycle.rollover_alias`.

See this https://www.youtube.com/watch?v=LRpMC2GS_FQ[resolving `empty or not defined` video] for an example troubleshooting walkthrough.

[discrete]
==== Alias [x] has more than one write index [y,z]

Only one index can be designated as the write index for a particular alias.

Use the <<indices-aliases, aliases>> API to set `is_write_index:false` for all but one index.

See this https://www.youtube.com/watch?v=jCUvZCT5Hm4[resolving `more than one write index` video] for an example troubleshooting walkthrough.

[discrete]
==== index name [x] does not match pattern ^.*-\d+

The index name must match the regex pattern `^.*-\d+` for the rollover action to work.
The most common problem is that the index name does not contain trailing digits.
For example, `my-index` does not match the pattern requirement.

Append a numeric value to the index name, for example `my-index-000001`.

See this https://www.youtube.com/watch?v=9sp1zF6iL00[resolving `does not match pattern` video] for an example troubleshooting walkthrough.

[discrete]
==== CircuitBreakingException: [x] data too large, data for [y]

This indicates that the cluster is hitting resource limits.

Before continuing to set up {ilm-init}, you'll need to take steps to alleviate the resource issues.
For more information, see <<circuit-breaker-errors>>.

[discrete]
==== High disk watermark [x] exceeded on [y]

This indicates that the cluster is running out of disk space.
This can happen when you don't have {ilm} set up to roll over from hot to warm nodes.
For more information, see <<fix-watermark-errors,Fix watermark errors>>.

[discrete]
==== security_exception: action [<action-name>] is unauthorized for user [<user-name>] with roles [<role-name>], this action is granted by the index privileges [manage_follow_index,manage,all]

This indicates the ILM action cannot be executed because the user that ILM uses to perform the action doesn’t have the correct privileges.
ILM actions are run as though they are performed by the last user who modified the policy with the privileges that user had at that time.
The account used to create or modify the policy must have permissions to perform all operations that are part of that policy.