| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141 | [[disk-based-shard-allocation]]==== Disk-based shard allocation settings[[disk-based-shard-allocation-description]]// tag::disk-based-shard-allocation-description-tag[]The disk-based shard allocator ensures that all nodes have enough disk spacewithout performing more shard movements than necessary. It allocates shardsbased on a pair of thresholds known as the _low watermark_ and the _highwatermark_. Its primary goal is to ensure that no node exceeds the highwatermark, or at least that any such overage is only temporary. If a nodeexceeds the high watermark then {es} will solve this by moving some of itsshards onto other nodes in the cluster.NOTE: It is normal for nodes to temporarily exceed the high watermark from timeto time.The allocator also tries to keep nodes clear of the high watermark byforbidding the allocation of more shards to a node that exceeds the lowwatermark. Importantly, if all of your nodes have exceeded the low watermarkthen no new shards can be allocated and {es} will not be able to move anyshards between nodes in order to keep the disk usage below the high watermark.You must ensure that your cluster has enough disk space in total and that thereare always some nodes below the low watermark.Shard movements triggered by the disk-based shard allocator must also satisfyall other shard allocation rules such as<<cluster-shard-allocation-filtering,allocation filtering>> and<<forced-awareness,forced awareness>>. If these rules are too strict then theycan also prevent the shard movements needed to keep the nodes' disk usage undercontrol. If you are using <<data-tiers,data tiers>> then {es} automaticallyconfigures allocation filtering rules to place shards within the appropriatetier, which means that the disk-based shard allocator works independentlywithin each tier.If a node is filling up its disk faster than {es} can move shards elsewherethen there is a risk that the disk will completely fill up. To prevent this, asa last resort, once the disk usage reaches the _flood-stage_ watermark {es}will block writes to indices with a shard on the affected node. It will alsocontinue to move shards onto the other nodes in the cluster. When disk usageon the affected node drops below the high watermark, {es} automatically removesthe write block.[[disk-based-shard-allocation-does-not-balance]][TIP]====It is normal for the nodes in your cluster to be using very different amountsof disk space. The <<shards-rebalancing-settings,balance>> of the clusterdepends only on the number of shards on each node and the indices to whichthose shards belong. It considers neither the sizes of these shards nor theavailable disk space on each node, for the following reasons:* Disk usage changes over time. Balancing the disk usage of individual nodeswould require a lot more shard movements, perhaps even wastefully undoingearlier movements. Moving a shard consumes resources such as I/O and networkbandwidth and may evict data from the filesystem cache. These resources arebetter spent handling your searches and indexing where possible.* A cluster with equal disk usage on every node typically performs no betterthan one that has unequal disk usage, as long as no disk is too full.====You can use the following settings to control disk-based allocation:[[cluster-routing-disk-threshold]]// tag::cluster-routing-disk-threshold-tag[]`cluster.routing.allocation.disk.threshold_enabled` {ess-icon}::(<<dynamic-cluster-setting,Dynamic>>)Defaults to `true`. Set to `false` to disable the disk allocation decider.// end::cluster-routing-disk-threshold-tag[][[cluster-routing-watermark-low]]// tag::cluster-routing-watermark-low-tag[]`cluster.routing.allocation.disk.watermark.low` {ess-icon}::(<<dynamic-cluster-setting,Dynamic>>)Controls the low watermark for disk usage. It defaults to `85%`, meaning that {es} will not allocate shards to nodes that have more than 85% disk used. It can also be set to an absolute byte value (like `500mb`) to prevent {es} from allocating shards if less than the specified amount of space is available. This setting has no effect on the primary shards of newly-created indices but will prevent their replicas from being allocated.// end::cluster-routing-watermark-low-tag[][[cluster-routing-watermark-high]]// tag::cluster-routing-watermark-high-tag[]`cluster.routing.allocation.disk.watermark.high` {ess-icon}::(<<dynamic-cluster-setting,Dynamic>>)Controls the high watermark. It defaults to `90%`, meaning that {es} will attempt to relocate shards away from a node whose disk usage is above 90%. It can also be set to an absolute byte value (similarly to the low watermark) to relocate shards away from a node if it has less than the specified amount of free space. This setting affects the allocation of all shards, whether previously allocated or not.// end::cluster-routing-watermark-high-tag[]`cluster.routing.allocation.disk.watermark.enable_for_single_data_node`::    (<<static-cluster-setting,Static>>)    For a single data node, the default is to disregard disk watermarks when    making an allocation decision. This is deprecated behavior and will be    changed in 8.0. This setting can be set to `true` to enable the    disk watermarks for a single data node cluster (will become default in 8.0).[[cluster-routing-flood-stage]]// tag::cluster-routing-flood-stage-tag[]`cluster.routing.allocation.disk.watermark.flood_stage` {ess-icon}::+--(<<dynamic-cluster-setting,Dynamic>>)Controls the flood stage watermark, which defaults to 95%. {es} enforces a read-only index block (`index.blocks.read_only_allow_delete`) on every index that has one or more shards allocated on the node, and that has at least one disk exceeding the flood stage. This setting is a last resort to prevent nodes from running out of disk space. The index block is automatically released when the disk utilization falls below the high watermark.NOTE: You cannot mix the usage of percentage values and byte values withinthese settings. Either all values are set to percentage values, or all are set to byte values. This enforcement is so that {es} can validate that the settings are internally consistent, ensuring that the low disk threshold is less than the high disk threshold, and the high disk threshold is less than the flood stage threshold.An example of resetting the read-only index block on the `my-index-000001` index:[source,console]--------------------------------------------------PUT /my-index-000001/_settings{  "index.blocks.read_only_allow_delete": null}--------------------------------------------------// TEST[setup:my_index]--// end::cluster-routing-flood-stage-tag[]`cluster.info.update.interval`::    (<<dynamic-cluster-setting,Dynamic>>)    How often {es} should check on disk usage for each node in the    cluster. Defaults to `30s`.NOTE: Percentage values refer to used disk space, while byte values refer tofree disk space. This can be confusing, since it flips the meaning of high andlow. For example, it makes sense to set the low watermark to 10gb and the highwatermark to 5gb, but not the other way around.An example of updating the low watermark to at least 100 gigabytes free, a highwatermark of at least 50 gigabytes free, and a flood stage watermark of 10gigabytes free, and updating the information about the cluster every minute:[source,console]--------------------------------------------------PUT _cluster/settings{  "transient": {    "cluster.routing.allocation.disk.watermark.low": "100gb",    "cluster.routing.allocation.disk.watermark.high": "50gb",    "cluster.routing.allocation.disk.watermark.flood_stage": "10gb",    "cluster.info.update.interval": "1m"  }}--------------------------------------------------
 |