error-handling.asciidoc 4.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[index-lifecycle-error-handling]]
  4. == Resolve lifecycle policy execution errors
  5. When {ilm-init} executes a lifecycle policy, it's possible for errors to occur
  6. while performing the necessary index operations for a step.
  7. When this happens, {ilm-init} moves the index to an `ERROR` step.
  8. If {ilm-init] cannot resolve the error automatically, execution is halted
  9. until you resolve the underlying issues with the policy, index, or cluster.
  10. For example, you might have a `shrink-index` policy that shrinks an index to four shards once it
  11. is at least five days old:
  12. [source,console]
  13. --------------------------------------------------
  14. PUT _ilm/policy/shrink-index
  15. {
  16. "policy": {
  17. "phases": {
  18. "warm": {
  19. "min_age": "5d",
  20. "actions": {
  21. "shrink": {
  22. "number_of_shards": 4
  23. }
  24. }
  25. }
  26. }
  27. }
  28. }
  29. --------------------------------------------------
  30. // TEST
  31. There is nothing that prevents you from applying the `shrink-index` policy to a new
  32. index that has only two shards:
  33. [source,console]
  34. --------------------------------------------------
  35. PUT /myindex
  36. {
  37. "settings": {
  38. "index.number_of_shards": 2,
  39. "index.lifecycle.name": "shrink-index"
  40. }
  41. }
  42. --------------------------------------------------
  43. // TEST[continued]
  44. After five days, {ilm-init} attempts to shrink `myindex` from two shards to four shards.
  45. Because the shrink action cannot _increase_ the number of shards, this operation fails
  46. and {ilm-init} moves `myindex` to the `ERROR` step.
  47. You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to get information about
  48. what went wrong:
  49. [source,console]
  50. --------------------------------------------------
  51. GET /myindex/_ilm/explain
  52. --------------------------------------------------
  53. // TEST[continued]
  54. Which returns the following information:
  55. [source,console-result]
  56. --------------------------------------------------
  57. {
  58. "indices" : {
  59. "myindex" : {
  60. "index" : "myindex",
  61. "managed" : true,
  62. "policy" : "shrink-index", <1>
  63. "lifecycle_date_millis" : 1541717265865,
  64. "age": "5.1d", <2>
  65. "phase" : "warm", <3>
  66. "phase_time_millis" : 1541717272601,
  67. "action" : "shrink", <4>
  68. "action_time_millis" : 1541717272601,
  69. "step" : "ERROR", <5>
  70. "step_time_millis" : 1541717272688,
  71. "failed_step" : "shrink", <6>
  72. "step_info" : {
  73. "type" : "illegal_argument_exception", <7>
  74. "reason" : "the number of target shards [4] must be less that the number of source shards [2]"
  75. },
  76. "phase_execution" : {
  77. "policy" : "shrink-index",
  78. "phase_definition" : { <8>
  79. "min_age" : "5d",
  80. "actions" : {
  81. "shrink" : {
  82. "number_of_shards" : 4
  83. }
  84. }
  85. },
  86. "version" : 1,
  87. "modified_date_in_millis" : 1541717264230
  88. }
  89. }
  90. }
  91. }
  92. --------------------------------------------------
  93. // TESTRESPONSE[skip:no way to know if we will get this response immediately]
  94. <1> The policy being used to manage the index: `shrink-index`
  95. <2> The index age: 5.1 days
  96. <3> The phase the index is currently in: `warm`
  97. <4> The current action: `shrink`
  98. <5> The step the index is currently in: `ERROR`
  99. <6> The step that failed to execute: `shrink`
  100. <7> The type of error and a description of that error.
  101. <8> The definition of the current phase from the `shrink-index` policy
  102. To resolve this, you could update the policy to shrink the index to a single shard after 5 days:
  103. [source,console]
  104. --------------------------------------------------
  105. PUT _ilm/policy/shrink-index
  106. {
  107. "policy": {
  108. "phases": {
  109. "warm": {
  110. "min_age": "5d",
  111. "actions": {
  112. "shrink": {
  113. "number_of_shards": 1
  114. }
  115. }
  116. }
  117. }
  118. }
  119. }
  120. --------------------------------------------------
  121. // TEST[continued]
  122. [discrete]
  123. === Retrying failed lifecycle policy steps
  124. Once you fix the problem that put an index in the `ERROR` step,
  125. you might need to explicitly tell {ilm-init} to retry the step:
  126. [source,console]
  127. --------------------------------------------------
  128. POST /myindex/_ilm/retry
  129. --------------------------------------------------
  130. // TEST[skip:we can't be sure the index is ready to be retried at this point]
  131. {ilm-init} subsequently attempts to re-run the step that failed.
  132. You can use the <<ilm-explain-lifecycle,{ilm-init} Explain API>> to monitor the progress.