|
@@ -78,35 +78,31 @@ A shard can become unassigned for several reasons. The following tips outline th
|
|
|
most common causes and their solutions.
|
|
|
|
|
|
[discrete]
|
|
|
-[[fix-cluster-status-reenable-allocation]]
|
|
|
-===== Re-enable shard allocation
|
|
|
+[[fix-cluster-status-only-one-node]]
|
|
|
+===== Single node cluster
|
|
|
|
|
|
-You typically disable allocation during a <<restart-cluster,restart>> or other
|
|
|
-cluster maintenance. If you forgot to re-enable allocation afterward, {es} will
|
|
|
-be unable to assign shards. To re-enable allocation, reset the
|
|
|
-`cluster.routing.allocation.enable` cluster setting.
|
|
|
+{es} will never assign a replica to the same node as the primary shard. A single-node cluster will always have yellow status. To change to green, set <<dynamic-index-number-of-replicas,number_of_replicas>> to 0 for all indices.
|
|
|
|
|
|
-[source,console]
|
|
|
-----
|
|
|
-PUT _cluster/settings
|
|
|
-{
|
|
|
- "persistent" : {
|
|
|
- "cluster.routing.allocation.enable" : null
|
|
|
- }
|
|
|
-}
|
|
|
-----
|
|
|
-
|
|
|
-See https://www.youtube.com/watch?v=MiKKUdZvwnI[this video] for walkthrough of troubleshooting "no allocations are allowed".
|
|
|
+Therefore, if the number of replicas equals or exceeds the number of nodes, some shards won't be allocated.
|
|
|
|
|
|
[discrete]
|
|
|
[[fix-cluster-status-recover-nodes]]
|
|
|
===== Recover lost nodes
|
|
|
|
|
|
Shards often become unassigned when a data node leaves the cluster. This can
|
|
|
-occur for several reasons, ranging from connectivity issues to hardware failure.
|
|
|
+occur for several reasons:
|
|
|
+
|
|
|
+* A manual node restart will cause a temporary unhealthy cluster state until the node recovers.
|
|
|
+
|
|
|
+* When a node becomes overloaded or fails, it can temporarily disrupt the cluster’s health, leading to an unhealthy state. Prolonged garbage collection (GC) pauses, caused by out-of-memory errors or high memory usage during intensive searches, can trigger this state. See <<fix-cluster-status-jvm,Reduce JVM memory pressure>> for more JVM-related issues.
|
|
|
+
|
|
|
+* Network issues can prevent reliable node communication, causing shards to become out of sync. Check the logs for repeated messages about nodes leaving and rejoining the cluster.
|
|
|
+
|
|
|
After you resolve the issue and recover the node, it will rejoin the cluster.
|
|
|
{es} will then automatically allocate any unassigned shards.
|
|
|
|
|
|
+You can monitor this process by <<cluster-health,checking your cluster health>>. The number of unallocated shards should progressively decrease until green status is reached.
|
|
|
+
|
|
|
To avoid wasting resources on temporary issues, {es} <<delayed-allocation,delays
|
|
|
allocation>> by one minute by default. If you've recovered a node and don’t want
|
|
|
to wait for the delay period, you can call the <<cluster-reroute,cluster reroute
|
|
@@ -155,7 +151,7 @@ replica, it remains unassigned. To fix this, you can:
|
|
|
|
|
|
* Change the `index.number_of_replicas` index setting to reduce the number of
|
|
|
replicas for each primary shard. We recommend keeping at least one replica per
|
|
|
-primary.
|
|
|
+primary for high availability.
|
|
|
|
|
|
[source,console]
|
|
|
----
|
|
@@ -166,7 +162,6 @@ PUT _settings
|
|
|
----
|
|
|
// TEST[s/^/PUT my-index\n/]
|
|
|
|
|
|
-
|
|
|
[discrete]
|
|
|
[[fix-cluster-status-disk-space]]
|
|
|
===== Free up or increase disk space
|
|
@@ -187,6 +182,8 @@ If your nodes are running low on disk space, you have a few options:
|
|
|
|
|
|
* Upgrade your nodes to increase disk space.
|
|
|
|
|
|
+* Add more nodes to the cluster.
|
|
|
+
|
|
|
* Delete unneeded indices to free up space. If you use {ilm-init}, you can
|
|
|
update your lifecycle policy to use <<ilm-searchable-snapshot,searchable
|
|
|
snapshots>> or add a delete phase. If you no longer need to search the data, you
|
|
@@ -219,11 +216,39 @@ watermark or set it to an explicit byte value.
|
|
|
PUT _cluster/settings
|
|
|
{
|
|
|
"persistent": {
|
|
|
- "cluster.routing.allocation.disk.watermark.low": "30gb"
|
|
|
+ "cluster.routing.allocation.disk.watermark.low": "90%",
|
|
|
+ "cluster.routing.allocation.disk.watermark.high": "95%"
|
|
|
}
|
|
|
}
|
|
|
----
|
|
|
-// TEST[s/"30gb"/null/]
|
|
|
+// TEST[s/"90%"/null/]
|
|
|
+// TEST[s/"95%"/null/]
|
|
|
+
|
|
|
+[IMPORTANT]
|
|
|
+====
|
|
|
+This is usually a temporary solution and may cause instability if disk space is not freed up.
|
|
|
+====
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[fix-cluster-status-reenable-allocation]]
|
|
|
+===== Re-enable shard allocation
|
|
|
+
|
|
|
+You typically disable allocation during a <<restart-cluster,restart>> or other
|
|
|
+cluster maintenance. If you forgot to re-enable allocation afterward, {es} will
|
|
|
+be unable to assign shards. To re-enable allocation, reset the
|
|
|
+`cluster.routing.allocation.enable` cluster setting.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----
|
|
|
+PUT _cluster/settings
|
|
|
+{
|
|
|
+ "persistent" : {
|
|
|
+ "cluster.routing.allocation.enable" : null
|
|
|
+ }
|
|
|
+}
|
|
|
+----
|
|
|
+
|
|
|
+See https://www.youtube.com/watch?v=MiKKUdZvwnI[this video] for walkthrough of troubleshooting "no allocations are allowed".
|
|
|
|
|
|
[discrete]
|
|
|
[[fix-cluster-status-jvm]]
|
|
@@ -271,4 +296,4 @@ POST _cluster/reroute?metric=none
|
|
|
// TEST[s/^/PUT my-index\n/]
|
|
|
// TEST[catch:bad_request]
|
|
|
|
|
|
-See https://www.youtube.com/watch?v=6OAg9IyXFO4[this video] for a walkthrough of troubleshooting `no_valid_shard_copy`.
|
|
|
+See https://www.youtube.com/watch?v=6OAg9IyXFO4[this video] for a walkthrough of troubleshooting `no_valid_shard_copy`.
|