123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251 |
- [role="xpack"]
- [[index-lifecycle-error-handling]]
- == Troubleshooting {ilm} errors
- When <<index-lifecycle-management,{ilm-init}>> executes a lifecycle policy, it's possible for errors to occur
- while performing the necessary index operations for a step.
- When this happens, {ilm-init} moves the index to an `ERROR` step.
- If {ilm-init} cannot resolve the error automatically, execution is halted
- until you resolve the underlying issues with the policy, index, or cluster.
- See https://www.youtube.com/watch?v=VCIqkji3IwY[this video]
- for a walkthrough of troubleshooting current {ilm-init} health issues, and https://www.youtube.com/watch?v=onrnnwjYWSQ[this video]
- for a walkthrough of troubleshooting historical {ilm-init} issues.
- For example, you might have a `shrink-index` policy that shrinks an index to four shards once it
- is at least five days old:
- [source,console]
- --------------------------------------------------
- PUT _ilm/policy/shrink-index
- {
- "policy": {
- "phases": {
- "warm": {
- "min_age": "5d",
- "actions": {
- "shrink": {
- "number_of_shards": 4
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST
- There is nothing that prevents you from applying the `shrink-index` policy to a new
- index that has only two shards:
- [source,console]
- --------------------------------------------------
- PUT /my-index-000001
- {
- "settings": {
- "index.number_of_shards": 2,
- "index.lifecycle.name": "shrink-index"
- }
- }
- --------------------------------------------------
- // TEST[continued]
- After five days, {ilm-init} attempts to shrink `my-index-000001` from two shards to four shards.
- Because the shrink action cannot _increase_ the number of shards, this operation fails
- and {ilm-init} moves `my-index-000001` to the `ERROR` step.
- You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to get information about
- what went wrong:
- [source,console]
- --------------------------------------------------
- GET /my-index-000001/_ilm/explain
- --------------------------------------------------
- // TEST[continued]
- Which returns the following information:
- [source,console-result]
- --------------------------------------------------
- {
- "indices" : {
- "my-index-000001" : {
- "index" : "my-index-000001",
- "managed" : true,
- "index_creation_date_millis" : 1541717265865,
- "time_since_index_creation": "5.1d",
- "policy" : "shrink-index", <1>
- "lifecycle_date_millis" : 1541717265865,
- "age": "5.1d", <2>
- "phase" : "warm", <3>
- "phase_time_millis" : 1541717272601,
- "action" : "shrink", <4>
- "action_time_millis" : 1541717272601,
- "step" : "ERROR", <5>
- "step_time_millis" : 1541717272688,
- "failed_step" : "shrink", <6>
- "step_info" : {
- "type" : "illegal_argument_exception", <7>
- "reason" : "the number of target shards [4] must be less that the number of source shards [2]"
- },
- "phase_execution" : {
- "policy" : "shrink-index",
- "phase_definition" : { <8>
- "min_age" : "5d",
- "actions" : {
- "shrink" : {
- "number_of_shards" : 4
- }
- }
- },
- "version" : 1,
- "modified_date_in_millis" : 1541717264230
- }
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[skip:no way to know if we will get this response immediately]
- <1> The policy being used to manage the index: `shrink-index`
- <2> The index age: 5.1 days
- <3> The phase the index is currently in: `warm`
- <4> The current action: `shrink`
- <5> The step the index is currently in: `ERROR`
- <6> The step that failed to execute: `shrink`
- <7> The type of error and a description of that error.
- <8> The definition of the current phase from the `shrink-index` policy
- To resolve this, you could update the policy to shrink the index to a single shard after 5 days:
- [source,console]
- --------------------------------------------------
- PUT _ilm/policy/shrink-index
- {
- "policy": {
- "phases": {
- "warm": {
- "min_age": "5d",
- "actions": {
- "shrink": {
- "number_of_shards": 1
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- [discrete]
- === Retrying failed lifecycle policy steps
- Once you fix the problem that put an index in the `ERROR` step,
- you might need to explicitly tell {ilm-init} to retry the step:
- [source,console]
- --------------------------------------------------
- POST /my-index-000001/_ilm/retry
- --------------------------------------------------
- // TEST[skip:we can't be sure the index is ready to be retried at this point]
- {ilm-init} subsequently attempts to re-run the step that failed.
- You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to monitor the progress.
- [discrete]
- === Common {ilm-init} setting issues
- [discrete]
- [[min-age-calculation]]
- ==== How `min_age` is calculated
- When setting up an <<set-up-lifecycle-policy,{ilm-init} policy>> or <<getting-started-index-lifecycle-management,automating rollover with {ilm-init}>>, be aware that `min_age` can be relative to either the rollover time or the index creation time.
- If you use <<ilm-rollover,{ilm-init} rollover>>, `min_age` is calculated relative to the time the index was rolled over. This is because the <<indices-rollover-index,rollover API>> generates a new index and updates the `age` of the previous index to reflect the rollover time. If the index hasn't been rolled over, then the `age` is the same as the `creation_date` for the index.
- You can override how `min_age` is calculated using the `index.lifecycle.origination_date` and `index.lifecycle.parse_origination_date` <<ilm-settings,{ilm-init} settings>>.
- [discrete]
- === Common {ilm-init} errors
- Here's how to resolve the most common errors reported in the `ERROR` step.
- TIP: Problems with rollover aliases are a common cause of errors.
- Consider using <<data-streams, data streams>> instead of managing rollover with aliases.
- [discrete]
- ==== Rollover alias [x] can point to multiple indices, found duplicated alias [x] in index template [z]
- The target rollover alias is specified in an index template's `index.lifecycle.rollover_alias` setting.
- You need to explicitly configure this alias _one time_ when you
- <<ilm-gs-alias-bootstrap, bootstrap the initial index>>.
- The rollover action then manages setting and updating the alias to
- <<rollover-index-api-desc, roll over>> to each subsequent index.
- Do not explicitly configure this same alias in the aliases section of an index template.
- See this https://www.youtube.com/watch?v=Ww5POq4zZtY[resolving `duplicate alias` video] for an example troubleshooting walkthrough.
- [discrete]
- ==== index.lifecycle.rollover_alias [x] does not point to index [y]
- Either the index is using the wrong alias or the alias does not exist.
- Check the `index.lifecycle.rollover_alias` <<indices-get-settings, index setting>>.
- To see what aliases are configured, use <<cat-alias, _cat/aliases>>.
- See this https://www.youtube.com/watch?v=NKSe67x7aw8[resolving `not point to index` video] for an example troubleshooting walkthrough.
- [discrete]
- ==== Setting [index.lifecycle.rollover_alias] for index [y] is empty or not defined
- The `index.lifecycle.rollover_alias` setting must be configured for the rollover action to work.
- Update the index settings to set `index.lifecycle.rollover_alias`.
- See this https://www.youtube.com/watch?v=LRpMC2GS_FQ[resolving `empty or not defined` video] for an example troubleshooting walkthrough.
- [discrete]
- ==== Alias [x] has more than one write index [y,z]
- Only one index can be designated as the write index for a particular alias.
- Use the <<indices-aliases, aliases>> API to set `is_write_index:false` for all but one index.
- See this https://www.youtube.com/watch?v=jCUvZCT5Hm4[resolving `more than one write index` video] for an example troubleshooting walkthrough.
- [discrete]
- ==== index name [x] does not match pattern ^.*-\d+
- The index name must match the regex pattern `^.*-\d+` for the rollover action to work.
- The most common problem is that the index name does not contain trailing digits.
- For example, `my-index` does not match the pattern requirement.
- Append a numeric value to the index name, for example `my-index-000001`.
- See this https://www.youtube.com/watch?v=9sp1zF6iL00[resolving `does not match pattern` video] for an example troubleshooting walkthrough.
- [discrete]
- ==== CircuitBreakingException: [x] data too large, data for [y]
- This indicates that the cluster is hitting resource limits.
- Before continuing to set up {ilm-init}, you'll need to take steps to alleviate the resource issues.
- For more information, see <<circuit-breaker-errors>>.
- [discrete]
- ==== High disk watermark [x] exceeded on [y]
- This indicates that the cluster is running out of disk space.
- This can happen when you don't have {ilm} set up to roll over from hot to warm nodes.
- For more information, see <<fix-watermark-errors,Fix watermark errors>>.
- [discrete]
- ==== security_exception: action [<action-name>] is unauthorized for user [<user-name>] with roles [<role-name>], this action is granted by the index privileges [manage_follow_index,manage,all]
- This indicates the ILM action cannot be executed because the user that ILM uses to perform the action doesn’t have the correct privileges.
- ILM actions are run as though they are performed by the last user who modified the policy with the privileges that user had at that time.
- The account used to create or modify the policy must have permissions to perform all operations that are part of that policy.
|