123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240 |
- [[red-yellow-cluster-status]]
- === Red or yellow cluster status
- A red or yellow cluster status indicates one or more shards are missing or
- unallocated. These unassigned shards increase your risk of data loss and can
- degrade cluster performance.
- [discrete]
- [[diagnose-cluster-status]]
- ==== Diagnose your cluster status
- **Check your cluster status**
- Use the <<cluster-health,cluster health API>>.
- [source,console]
- ----
- GET _cluster/health?filter_path=status,*_shards
- ----
- A healthy cluster has a green `status` and zero `unassigned_shards`. A yellow
- status means only replicas are unassigned. A red status means one or
- more primary shards are unassigned.
- **View unassigned shards**
- To view unassigned shards, use the <<cat-shards,cat shards API>>.
- [source,console]
- ----
- GET _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state
- ----
- Unassigned shards have a `state` of `UNASSIGNED`. The `prirep` value is `p` for
- primary shards and `r` for replicas.
- To understand why an unassigned shard is not being assigned and what action
- you must take to allow {es} to assign it, use the
- <<cluster-allocation-explain,cluster allocation explanation API>>.
- [source,console]
- ----
- GET _cluster/allocation/explain?filter_path=index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*
- {
- "index": "my-index",
- "shard": 0,
- "primary": false,
- "current_node": "my-node"
- }
- ----
- // TEST[s/^/PUT my-index\n/]
- // TEST[s/"primary": false,/"primary": false/]
- // TEST[s/"current_node": "my-node"//]
- [discrete]
- [[fix-red-yellow-cluster-status]]
- ==== Fix a red or yellow cluster status
- A shard can become unassigned for several reasons. The following tips outline the
- most common causes and their solutions.
- **Re-enable shard allocation**
- You typically disable allocation during a <<restart-cluster,restart>> or other
- cluster maintenance. If you forgot to re-enable allocation afterward, {es} will
- be unable to assign shards. To re-enable allocation, reset the
- `cluster.routing.allocation.enable` cluster setting.
- [source,console]
- ----
- PUT _cluster/settings
- {
- "persistent" : {
- "cluster.routing.allocation.enable" : null
- }
- }
- ----
- **Recover lost nodes**
- Shards often become unassigned when a data node leaves the cluster. This can
- occur for several reasons, ranging from connectivity issues to hardware failure.
- After you resolve the issue and recover the node, it will rejoin the cluster.
- {es} will then automatically allocate any unassigned shards.
- To avoid wasting resources on temporary issues, {es} <<delayed-allocation,delays
- allocation>> by one minute by default. If you've recovered a node and don’t want
- to wait for the delay period, you can call the <<cluster-reroute,cluster reroute
- API>> with no arguments to start the allocation process. The process runs
- asynchronously in the background.
- [source,console]
- ----
- POST _cluster/reroute?metric=none
- ----
- **Fix allocation settings**
- Misconfigured allocation settings can result in an unassigned primary shard.
- These settings include:
- * <<shard-allocation-filtering,Shard allocation>> index settings
- * <<cluster-shard-allocation-filtering,Allocation filtering>> cluster settings
- * <<shard-allocation-awareness,Allocation awareness>> cluster settings
- To review your allocation settings, use the <<indices-get-settings,get index
- settings>> and <<cluster-get-settings,cluster get settings>> APIs.
- [source,console]
- ----
- GET my-index/_settings?flat_settings=true&include_defaults=true
- GET _cluster/settings?flat_settings=true&include_defaults=true
- ----
- // TEST[s/^/PUT my-index\n/]
- You can change the settings using the <<indices-update-settings,update index
- settings>> and <<cluster-update-settings,cluster update settings>> APIs.
- **Allocate or reduce replicas**
- To protect against hardware failure, {es} will not assign a replica to the same
- node as its primary shard. If no other data nodes are available to host the
- replica, it remains unassigned. To fix this, you can:
- * Add a data node to the same tier to host the replica.
- * Change the `index.number_of_replicas` index setting to reduce the number of
- replicas for each primary shard. We recommend keeping at least one replica per
- primary.
- [source,console]
- ----
- PUT _settings
- {
- "index.number_of_replicas": 1
- }
- ----
- // TEST[s/^/PUT my-index\n/]
- **Free up or increase disk space**
- {es} uses a <<disk-based-shard-allocation,low disk watermark>> to ensure data
- nodes have enough disk space for incoming shards. By default, {es} does not
- allocate shards to nodes using more than 85% of disk space.
- To check the current disk space of your nodes, use the <<cat-allocation,cat
- allocation API>>.
- [source,console]
- ----
- GET _cat/allocation?v=true&h=node,shards,disk.*
- ----
- If your nodes are running low on disk space, you have a few options:
- * Upgrade your nodes to increase disk space.
- * Delete unneeded indices to free up space. If you use {ilm-init}, you can
- update your lifecycle policy to use <<ilm-searchable-snapshot,searchable
- snapshots>> or add a delete phase. If you no longer need to search the data, you
- can use a <<snapshot-restore,snapshot>> to store it off-cluster.
- * If you no longer write to an index, use the <<indices-forcemerge,force merge
- API>> or {ilm-init}'s <<ilm-forcemerge,force merge action>> to merge its
- segments into larger ones.
- +
- [source,console]
- ----
- POST my-index/_forcemerge
- ----
- // TEST[s/^/PUT my-index\n/]
- * If an index is read-only, use the <<indices-shrink-index,shrink index API>> or
- {ilm-init}'s <<ilm-shrink,shrink action>> to reduce its primary shard count.
- +
- [source,console]
- ----
- POST my-index/_shrink/my-shrunken-index
- ----
- // TEST[s/^/PUT my-index\n{"settings":{"index.number_of_shards":2,"blocks.write":true}}\n/]
- * If your node has a large disk capacity, you can increase the low disk
- watermark or set it to an explicit byte value.
- +
- [source,console]
- ----
- PUT _cluster/settings
- {
- "persistent": {
- "cluster.routing.allocation.disk.watermark.low": "30gb"
- }
- }
- ----
- // TEST[s/"30gb"/null/]
- **Reduce JVM memory pressure**
- Shard allocation requires JVM heap memory. High JVM memory pressure can trigger
- <<circuit-breaker,circuit breakers>> that stop allocation and leave shards
- unassigned. See <<high-jvm-memory-pressure>>.
- **Recover data for a lost primary shard**
- If a node containing a primary shard is lost, {es} can typically replace it
- using a replica on another node. If you can't recover the node and replicas
- don't exist or are irrecoverable, you'll need to re-add the missing data from a
- <<snapshot-restore,snapshot>> or the original data source.
- WARNING: Only use this option if node recovery is no longer possible. This
- process allocates an empty primary shard. If the node later rejoins the cluster,
- {es} will overwrite its primary shard with data from this newer empty shard,
- resulting in data loss.
- Use the <<cluster-reroute,cluster reroute API>> to manually allocate the
- unassigned primary shard to another data node in the same tier. Set
- `accept_data_loss` to `true`.
- [source,console]
- ----
- POST _cluster/reroute?metric=none
- {
- "commands": [
- {
- "allocate_empty_primary": {
- "index": "my-index",
- "shard": 0,
- "node": "my-node",
- "accept_data_loss": "true"
- }
- }
- ]
- }
- ----
- // TEST[s/^/PUT my-index\n/]
- // TEST[catch:bad_request]
- If you backed up the missing index data to a snapshot, use the
- <<restore-snapshot-api,restore snapshot API>> to restore the individual index.
- Alternatively, you can index the missing data from the original data source.
|