allocation-explain.asciidoc 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322
  1. [[cluster-allocation-explain]]
  2. === Cluster Allocation Explain API
  3. The purpose of the cluster allocation explain API is to provide
  4. explanations for shard allocations in the cluster. For unassigned shards,
  5. the explain API provides an explanation for why the shard is unassigned.
  6. For assigned shards, the explain API provides an explanation for why the
  7. shard is remaining on its current node and has not moved or rebalanced to
  8. another node. This API can be very useful when attempting to diagnose why
  9. a shard is unassigned or why a shard continues to remain on its current node
  10. when you might expect otherwise.
  11. [float]
  12. ==== Explain API Request
  13. To explain the allocation of a shard, first an index should exist:
  14. [source,js]
  15. --------------------------------------------------
  16. PUT /myindex
  17. --------------------------------------------------
  18. // CONSOLE
  19. // TESTSETUP
  20. And then the allocation for shards of that index can be explained:
  21. [source,js]
  22. --------------------------------------------------
  23. GET /_cluster/allocation/explain
  24. {
  25. "index": "myindex",
  26. "shard": 0,
  27. "primary": true
  28. }
  29. --------------------------------------------------
  30. // CONSOLE
  31. Specify the `index` and `shard` id of the shard you would like an explanation
  32. for, as well as the `primary` flag to indicate whether to explain the primary
  33. shard for the given shard id or one of its replica shards. These three request
  34. parameters are required.
  35. You may also specify an optional `current_node` request parameter to only explain
  36. a shard that is currently located on `current_node`. The `current_node` can be
  37. specified as either the node id or node name.
  38. [source,js]
  39. --------------------------------------------------
  40. GET /_cluster/allocation/explain
  41. {
  42. "index": "myindex",
  43. "shard": 0,
  44. "primary": false,
  45. "current_node": "nodeA" <1>
  46. }
  47. --------------------------------------------------
  48. // CONSOLE
  49. // TEST[skip:no way of knowing the current_node]
  50. <1> The node where shard 0 currently has a replica on
  51. You can also have Elasticsearch explain the allocation of the first unassigned
  52. shard that it finds by sending an empty body for the request:
  53. [source,js]
  54. --------------------------------------------------
  55. GET /_cluster/allocation/explain
  56. --------------------------------------------------
  57. // CONSOLE
  58. [float]
  59. ==== Explain API Response
  60. This section includes examples of the cluster allocation explain API response output
  61. under various scenarios.
  62. //////////////////////////
  63. [source,js]
  64. --------------------------------------------------
  65. PUT /idx?master_timeout=1s&timeout=1s
  66. {"settings": {"index.routing.allocation.include._name": "non_existent_node"} }
  67. GET /_cluster/allocation/explain
  68. {
  69. "index": "idx",
  70. "shard": 0,
  71. "primary": true
  72. }
  73. --------------------------------------------------
  74. // CONSOLE
  75. //////////////////////////
  76. The API response for an unassigned shard:
  77. [source,js]
  78. --------------------------------------------------
  79. {
  80. "index" : "idx",
  81. "shard" : 0,
  82. "primary" : true,
  83. "current_state" : "unassigned", <1>
  84. "unassigned_info" : {
  85. "reason" : "INDEX_CREATED", <2>
  86. "at" : "2017-01-04T18:08:16.600Z",
  87. "last_allocation_status" : "no"
  88. },
  89. "can_allocate" : "no", <3>
  90. "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  91. "node_allocation_decisions" : [
  92. {
  93. "node_id" : "8qt2rY-pT6KNZB3-hGfLnw",
  94. "node_name" : "node-0",
  95. "transport_address" : "127.0.0.1:9401",
  96. "node_attributes" : {},
  97. "node_decision" : "no", <4>
  98. "weight_ranking" : 1,
  99. "deciders" : [
  100. {
  101. "decider" : "filter", <5>
  102. "decision" : "NO",
  103. "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"non_existent_node\"]" <6>
  104. }
  105. ]
  106. }
  107. ]
  108. }
  109. --------------------------------------------------
  110. // TESTRESPONSE[s/"at" : "[^"]*"/"at" : $body.$_path/]
  111. // TESTRESPONSE[s/"node_id" : "[^"]*"/"node_id" : $body.$_path/]
  112. // TESTRESPONSE[s/"transport_address" : "[^"]*"/"transport_address" : $body.$_path/]
  113. // TESTRESPONSE[s/"node_attributes" : \{\}/"node_attributes" : $body.$_path/]
  114. <1> The current state of the shard
  115. <2> The reason for the shard originally becoming unassigned
  116. <3> Whether to allocate the shard
  117. <4> Whether to allocate the shard to the particular node
  118. <5> The decider which led to the `no` decision for the node
  119. <6> An explanation as to why the decider returned a `no` decision, with a helpful hint pointing to the setting that led to the decision
  120. You can return information gathered by the cluster info service about disk usage
  121. and shard sizes by setting the `include_disk_info` parameter to `true`:
  122. [source,js]
  123. --------------------------------------------------
  124. GET /_cluster/allocation/explain?include_disk_info=true
  125. --------------------------------------------------
  126. // CONSOLE
  127. Additionally, if you would like to include all decisions that were factored into the final
  128. decision, the `include_yes_decisions` parameter will return all decisions for each node:
  129. [source,js]
  130. --------------------------------------------------
  131. GET /_cluster/allocation/explain?include_yes_decisions=true
  132. --------------------------------------------------
  133. // CONSOLE
  134. The default value for `include_yes_decisions` is `false`, which will only
  135. include the `no` decisions in the response. This is generally what you would
  136. want, as the `no` decisions indicate why a shard is unassigned or cannot be moved,
  137. and including all decisions include the `yes` ones adds a lot of verbosity to the
  138. API's response output.
  139. The API response output for an unassigned primary shard that had previously been
  140. allocated to a node in the cluster:
  141. [source,js]
  142. --------------------------------------------------
  143. {
  144. "index" : "idx",
  145. "shard" : 0,
  146. "primary" : true,
  147. "current_state" : "unassigned",
  148. "unassigned_info" : {
  149. "reason" : "NODE_LEFT",
  150. "at" : "2017-01-04T18:03:28.464Z",
  151. "details" : "node_left[OIWe8UhhThCK0V5XfmdrmQ]",
  152. "last_allocation_status" : "no_valid_shard_copy"
  153. },
  154. "can_allocate" : "no_valid_shard_copy",
  155. "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster"
  156. }
  157. --------------------------------------------------
  158. // NOTCONSOLE
  159. The API response output for a replica that is unassigned due to delayed allocation:
  160. [source,js]
  161. --------------------------------------------------
  162. {
  163. "index" : "idx",
  164. "shard" : 0,
  165. "primary" : false,
  166. "current_state" : "unassigned",
  167. "unassigned_info" : {
  168. "reason" : "NODE_LEFT",
  169. "at" : "2017-01-04T18:53:59.498Z",
  170. "details" : "node_left[G92ZwuuaRY-9n8_tc-IzEg]",
  171. "last_allocation_status" : "no_attempt"
  172. },
  173. "can_allocate" : "allocation_delayed",
  174. "allocate_explanation" : "cannot allocate because the cluster is still waiting 59.8s for the departed node holding a replica to rejoin, despite being allowed to allocate the shard to at least one other node",
  175. "configured_delay" : "1m", <1>
  176. "configured_delay_in_millis" : 60000,
  177. "remaining_delay" : "59.8s", <2>
  178. "remaining_delay_in_millis" : 59824,
  179. "node_allocation_decisions" : [
  180. {
  181. "node_id" : "pmnHu_ooQWCPEFobZGbpWw",
  182. "node_name" : "node_t2",
  183. "transport_address" : "127.0.0.1:9402",
  184. "node_decision" : "yes"
  185. },
  186. {
  187. "node_id" : "3sULLVJrRneSg0EfBB-2Ew",
  188. "node_name" : "node_t0",
  189. "transport_address" : "127.0.0.1:9400",
  190. "node_decision" : "no",
  191. "store" : { <3>
  192. "matching_size" : "4.2kb",
  193. "matching_size_in_bytes" : 4325
  194. },
  195. "deciders" : [
  196. {
  197. "decider" : "same_shard",
  198. "decision" : "NO",
  199. "explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[idx][0], node[3sULLVJrRneSg0EfBB-2Ew], [P], s[STARTED], a[id=eV9P8BN1QPqRc3B4PLx6cg]]"
  200. }
  201. ]
  202. }
  203. ]
  204. }
  205. --------------------------------------------------
  206. // NOTCONSOLE
  207. <1> The configured delay before allocating a replica shard that does not exist due to the node holding it leaving the cluster
  208. <2> The remaining delay before allocating the replica shard
  209. <3> Information about the shard data found on a node
  210. The API response output for an assigned shard that is not allowed to
  211. remain on its current node and is required to move:
  212. [source,js]
  213. --------------------------------------------------
  214. {
  215. "index" : "idx",
  216. "shard" : 0,
  217. "primary" : true,
  218. "current_state" : "started",
  219. "current_node" : {
  220. "id" : "8lWJeJ7tSoui0bxrwuNhTA",
  221. "name" : "node_t1",
  222. "transport_address" : "127.0.0.1:9401"
  223. },
  224. "can_remain_on_current_node" : "no", <1>
  225. "can_remain_decisions" : [ <2>
  226. {
  227. "decider" : "filter",
  228. "decision" : "NO",
  229. "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"non_existent_node\"]"
  230. }
  231. ],
  232. "can_move_to_other_node" : "no", <3>
  233. "move_explanation" : "cannot move shard to another node, even though it is not allowed to remain on its current node",
  234. "node_allocation_decisions" : [
  235. {
  236. "node_id" : "_P8olZS8Twax9u6ioN-GGA",
  237. "node_name" : "node_t0",
  238. "transport_address" : "127.0.0.1:9400",
  239. "node_decision" : "no",
  240. "weight_ranking" : 1,
  241. "deciders" : [
  242. {
  243. "decider" : "filter",
  244. "decision" : "NO",
  245. "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"non_existent_node\"]"
  246. }
  247. ]
  248. }
  249. ]
  250. }
  251. --------------------------------------------------
  252. // NOTCONSOLE
  253. <1> Whether the shard is allowed to remain on its current node
  254. <2> The deciders that factored into the decision of why the shard is not allowed to remain on its current node
  255. <3> Whether the shard is allowed to be allocated to another node
  256. The API response output for an assigned shard that remains on its current node
  257. because moving the shard to another node does not form a better cluster balance:
  258. [source,js]
  259. --------------------------------------------------
  260. {
  261. "index" : "idx",
  262. "shard" : 0,
  263. "primary" : true,
  264. "current_state" : "started",
  265. "current_node" : {
  266. "id" : "wLzJm4N4RymDkBYxwWoJsg",
  267. "name" : "node_t0",
  268. "transport_address" : "127.0.0.1:9400",
  269. "weight_ranking" : 1
  270. },
  271. "can_remain_on_current_node" : "yes",
  272. "can_rebalance_cluster" : "yes", <1>
  273. "can_rebalance_to_other_node" : "no", <2>
  274. "rebalance_explanation" : "cannot rebalance as no target node exists that can both allocate this shard and improve the cluster balance",
  275. "node_allocation_decisions" : [
  276. {
  277. "node_id" : "oE3EGFc8QN-Tdi5FFEprIA",
  278. "node_name" : "node_t1",
  279. "transport_address" : "127.0.0.1:9401",
  280. "node_decision" : "worse_balance", <3>
  281. "weight_ranking" : 1
  282. }
  283. ]
  284. }
  285. --------------------------------------------------
  286. // NOTCONSOLE
  287. <1> Whether rebalancing is allowed on the cluster
  288. <2> Whether the shard can be rebalanced to another node
  289. <3> The reason the shard cannot be rebalanced to the node, in this case indicating that it offers no better balance than the current node