Browse Source

Troubleshooting guide for diagnosing unassigned shards (#86996)

Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Andrei Dan 3 years ago
parent
commit
490f417efd

+ 40 - 0
docs/reference/tab-widgets/troubleshooting/data/diagnose-unassigned-shards-widget.asciidoc

@@ -0,0 +1,40 @@
+++++
+<div class="tabs" data-tab-group="host">
+  <div role="tablist" aria-label="Diagnose unassigned shards">
+    <button role="tab"
+            aria-selected="true"
+            aria-controls="cloud-tab-diagnose-unassigned-shards"
+            id="cloud-diagnose-unassigned-shards">
+      Elasticsearch Service
+    </button>
+    <button role="tab"
+            aria-selected="false"
+            aria-controls="self-managed-tab-diagnose-unassigned-shards"
+            id="self-managed-diagnose-unassigned-shards"
+            tabindex="-1">
+      Self-managed
+    </button>
+  </div>
+  <div tabindex="0"
+       role="tabpanel"
+       id="cloud-tab-diagnose-unassigned-shards"
+       aria-labelledby="cloud-diagnose-unassigned-shards">
+++++
+
+include::diagnose-unassigned-shards.asciidoc[tag=cloud]
+
+++++
+  </div>
+  <div tabindex="0"
+       role="tabpanel"
+       id="self-managed-tab-diagnose-unassigned-shards"
+       aria-labelledby="self-managed-diagnose-unassigned-shards"
+       hidden="">
+++++
+
+include::diagnose-unassigned-shards.asciidoc[tag=cloud]
+
+++++
+  </div>
+</div>
+++++

+ 283 - 0
docs/reference/tab-widgets/troubleshooting/data/diagnose-unassigned-shards.asciidoc

@@ -0,0 +1,283 @@
+////
+[source,console]
+----
+PUT my-index-000001?master_timeout=1s&timeout=1s
+{
+  "settings": {
+    "index.routing.allocation.include._name": "nonexistent_node",
+    "index.routing.allocation.include._tier_preference": null
+  }
+}
+----
+////
+
+// tag::cloud[]
+In order to diagnose the unassigned shards, follow the next steps:
+
+**Use {kib}**
+
+//tag::kibana-api-ex[]
+. Log in to the {ess-console}[{ecloud} console].
++
+
+. On the **Elasticsearch Service** panel, click the name of your deployment. 
++
+
+NOTE:
+If the name of your deployment is disabled your {kib} instances might be
+unhealthy, in which case please contact https://support.elastic.co[Elastic Support],
+or your deployment doesn't include {kib}, in which case all you need to do is 
+{cloud}/ec-access-kibana.html[enable Kibana first].
+
+. Open your deployment's side navigation menu (placed under the Elastic logo in the upper left corner)
+and go to **Dev Tools > Console**.
++
+[role="screenshot"]
+image::images/kibana-console.png[{kib} Console,align="center"]
+
+. View the unassigned shards using the <<cat-shards,cat shards API>>.
++
+[source,console]
+----
+GET _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state
+----
++
+The response will look like this:
++
+[source,console-result]
+----
+[
+  {
+    "index": "my-index-000001",
+    "shard": "0",
+    "prirep": "p",
+    "state": "UNASSIGNED",
+    "node": null,
+    "unassigned.reason": "INDEX_CREATED"
+  }
+]
+----
+// TEST[skip:illustration purposes only]
+
++
+Unassigned shards have a `state` of `UNASSIGNED`. The `prirep` value is `p` for
+primary shards and `r` for replicas.
++
+The index in the example has a primary shard unassigned.
+
+. To understand why an unassigned shard is not being assigned and what action
+you must take to allow {es} to assign it, use the
+<<cluster-allocation-explain,cluster allocation explanation API>>.
++
+[source,console]
+----
+GET _cluster/allocation/explain
+{
+  "index": "my-index-000001", <1>
+  "shard": 0, <2>
+  "primary": true <3>
+}
+----
+// TEST[skip:illustration purposes only]
++
+<1> The index we want to diagnose.
++
+<2> The unassigned shard ID.
++
+<3> Indicates that we are diagnosing a primary shard.
++
+The response will look like this:
++
+[source,console-result]
+----
+{
+  "index" : "my-index-000001",
+  "shard" : 0,
+  "primary" : true,
+  "current_state" : "unassigned",                 <1>
+  "unassigned_info" : {
+    "reason" : "INDEX_CREATED",                   <2>
+    "at" : "2022-01-04T18:08:16.600Z",
+    "last_allocation_status" : "no"
+  },
+  "can_allocate" : "no",                          <3>
+  "allocate_explanation" : "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
+  "node_allocation_decisions" : [
+    {
+      "node_id" : "8qt2rY-pT6KNZB3-hGfLnw",
+      "node_name" : "node-0",
+      "transport_address" : "127.0.0.1:9401",
+      "node_attributes" : {},
+      "node_decision" : "no",                     <4>
+      "weight_ranking" : 1,
+      "deciders" : [
+        {
+          "decider" : "filter",                   <5>
+          "decision" : "NO",
+          "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]"  <6>
+        }
+      ]
+    }
+  ]
+}
+----
+// TEST[skip:illustration purposes only]
++
+<1> The current state of the shard.
++
+<2> The reason for the shard originally becoming unassigned.
++
+<3> Whether to allocate the shard.
++
+<4> Whether to allocate the shard to the particular node.
++
+<5> The decider which led to the `no` decision for the node.
++
+<6> An explanation as to why the decider returned a `no` decision, with a helpful hint pointing to the setting that led to the decision.
+
+. The explanation in our case indicates the index allocation configurations are not correct.
+To review your allocation settings, use the <<indices-get-settings,get index
+settings>> and <<cluster-get-settings,cluster get settings>> APIs.
++
+[source,console]
+----
+GET my-index-000001/_settings?flat_settings=true&include_defaults=true
+
+GET _cluster/settings?flat_settings=true&include_defaults=true
+----
+// TEST[s/^/PUT my-index-000001\n/]
+
+. Change the settings using the <<indices-update-settings,update index
+settings>> and <<cluster-update-settings,cluster update settings>> APIs to the
+correct values in order to allow the index to be allocated.
+
+For more guidance on fixing the most common causes for unassinged shards please follow
+<<fix-red-yellow-cluster-status, this guide>> or contact https://support.elastic.co[Elastic Support]. 
+
+//end::kibana-api-ex[]
+// end::cloud[]
+
+// tag::self-managed[]
+In order to diagnose the unassigned shards follow the next steps:
+
+. View the unassigned shards using the <<cat-shards,cat shards API>>.
++
+[source,console]
+----
+GET _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state
+----
++
+The response will look like this:
++
+[source,console-result]
+----
+[
+  {
+    "index": "my-index-000001",
+    "shard": "0",
+    "prirep": "p",
+    "state": "UNASSIGNED",
+    "node": null,
+    "unassigned.reason": "INDEX_CREATED"
+  }
+]
+----
+// TEST[skip:illustration purposes only]
+
++
+Unassigned shards have a `state` of `UNASSIGNED`. The `prirep` value is `p` for
+primary shards and `r` for replicas.
++
+The index in the example has a primary shard unassigned.
+
+. To understand why an unassigned shard is not being assigned and what action
+you must take to allow {es} to assign it, use the
+<<cluster-allocation-explain,cluster allocation explanation API>>.
++
+[source,console]
+----
+GET _cluster/allocation/explain
+{
+  "index": "my-index-000001", <1>
+  "shard": 0, <2>
+  "primary": true <3>
+}
+----
+// TEST[skip:illustration purposes only]
++
+<1> The index we want to diagnose.
++
+<2> The unassigned shard ID.
++
+<3> Indicates that we are diagnosing a primary shard.
++
+The response will look like this:
++
+[source,console-result]
+----
+{
+  "index" : "my-index-000001",
+  "shard" : 0,
+  "primary" : true,
+  "current_state" : "unassigned",                 <1>
+  "unassigned_info" : {
+    "reason" : "INDEX_CREATED",                   <2>
+    "at" : "2022-01-04T18:08:16.600Z",
+    "last_allocation_status" : "no"
+  },
+  "can_allocate" : "no",                          <3>
+  "allocate_explanation" : "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
+  "node_allocation_decisions" : [
+    {
+      "node_id" : "8qt2rY-pT6KNZB3-hGfLnw",
+      "node_name" : "node-0",
+      "transport_address" : "127.0.0.1:9401",
+      "node_attributes" : {},
+      "node_decision" : "no",                     <4>
+      "weight_ranking" : 1,
+      "deciders" : [
+        {
+          "decider" : "filter",                   <5>
+          "decision" : "NO",
+          "explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]"  <6>
+        }
+      ]
+    }
+  ]
+}
+----
+// TEST[skip:illustration purposes only]
++
+<1> The current state of the shard.
++
+<2> The reason for the shard originally becoming unassigned.
++
+<3> Whether to allocate the shard.
++
+<4> Whether to allocate the shard to the particular node.
++
+<5> The decider which led to the `no` decision for the node.
++
+<6> An explanation as to why the decider returned a `no` decision, with a helpful hint pointing to the setting that led to the decision.
+
+. The explanation in our case indicates the index allocation configurations are not correct.
+To review your allocation settings, use the <<indices-get-settings,get index
+settings>> and <<cluster-get-settings,cluster get settings>> APIs.
++
+[source,console]
+----
+GET my-index-000001/_settings?flat_settings=true&include_defaults=true
+
+GET _cluster/settings?flat_settings=true&include_defaults=true
+----
+// TEST[s/^/PUT my-index-000001\n/]
+
+. Change the settings using the <<indices-update-settings,update index
+settings>> and <<cluster-update-settings,cluster update settings>> APIs to the
+correct values in order to allow the index to be allocated.
+
+For more guidance on fixing the most common causes for unassinged shards please follow
+<<fix-red-yellow-cluster-status, this guide>>. 
+
+// end::self-managed[]
+

+ 1 - 0
docs/reference/troubleshooting.asciidoc

@@ -14,4 +14,5 @@ include::troubleshooting/data/increase-shard-limit.asciidoc[]
 
 include::troubleshooting/data/increase-cluster-shard-limit.asciidoc[]
 
+include::troubleshooting/data/diagnose-unassigned-shards.asciidoc[]
 

+ 12 - 0
docs/reference/troubleshooting/data/diagnose-unassigned-shards.asciidoc

@@ -0,0 +1,12 @@
+[[diagnose-unassigned-shards]]
+== Diagnose unassigned shards
+
+There are multiple reasons why shards might get unassigned, ranging from misconfigured
+allocation settings to lack of disk space.
+
+In order to diagnose the unassigned shards in your deployment use the following steps:
+
+include::{es-repo-dir}/tab-widgets/troubleshooting/data/diagnose-unassigned-shards-widget.asciidoc[]
+
+
+