| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140 | [[modules-discovery-voting]]=== Voting configurationsEach {es} cluster has a _voting configuration_, which is the set of<<master-node,master-eligible nodes>> whose responses are counted when makingdecisions such as electing a new master or committing a new cluster state.Decisions are made only after a majority (more than half) of the nodes in thevoting configuration respond.Usually the voting configuration is the same as the set of all the master-eligible nodes that are currently in the cluster. However, there are somesituations in which they may be different.IMPORTANT: To ensure the cluster remains available, you **must not stop half ormore of the nodes in the voting configuration at the same time**. As long as morethan half of the voting nodes are available, the cluster can work normally. Forexample, if there are three or four master-eligible nodes, the clustercan tolerate one unavailable node. If there are two or fewer master-eligiblenodes, they must all remain available.After a node joins or leaves the cluster, {es} reacts by automatically makingcorresponding changes to the voting configuration in order to ensure that thecluster is as resilient as possible. It is important to wait for this adjustmentto complete before you remove more nodes from the cluster. For more information,see <<modules-discovery-adding-removing-nodes>>.The current voting configuration is stored in the cluster state so you caninspect its current contents as follows:[source,js]--------------------------------------------------GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config--------------------------------------------------// CONSOLENOTE: The current voting configuration is not necessarily the same as the set ofall available master-eligible nodes in the cluster. Altering the votingconfiguration involves taking a vote, so it takes some time to adjust theconfiguration as nodes join or leave the cluster. Also, there are situationswhere the most resilient configuration includes unavailable nodes or does notinclude some available nodes. In these situations, the voting configurationdiffers from the set of available master-eligible nodes in the cluster.Larger voting configurations are usually more resilient, so Elasticsearchnormally prefers to add master-eligible nodes to the voting configuration afterthey join the cluster. Similarly, if a node in the voting configurationleaves the cluster and there is another master-eligible node in the cluster thatis not in the voting configuration then it is preferable to swap these two nodesover. The size of the voting configuration is thus unchanged but itsresilience increases.It is not so straightforward to automatically remove nodes from the votingconfiguration after they have left the cluster. Different strategies havedifferent benefits and drawbacks, so the right choice depends on how the clusterwill be used. You can control whether the voting configuration automaticallyshrinks by using the<<modules-discovery-settings,`cluster.auto_shrink_voting_configuration` setting>>.NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true` (which isthe default and recommended value) and there are at least three master-eligiblenodes in the cluster, Elasticsearch remains capable of processing cluster stateupdates as long as all but one of its master-eligible nodes are healthy.There are situations in which Elasticsearch might tolerate the loss of multiplenodes, but this is not guaranteed under all sequences of failures. If the`cluster.auto_shrink_voting_configuration` setting is `false`, you must removedeparted nodes from the voting configuration manually. Use the<<voting-config-exclusions,voting exclusions API>> to achieve the desired levelof resilience.No matter how it is configured, Elasticsearch will not suffer from a "split-brain" inconsistency. The `cluster.auto_shrink_voting_configuration`setting affects only its availability in the event of the failure of some of itsnodes and the administrative tasks that must be performed as nodes join andleave the cluster.[float]==== Even numbers of master-eligible nodesThere should normally be an odd number of master-eligible nodes in a cluster.If there is an even number, Elasticsearch leaves one of them out of the votingconfiguration to ensure that it has an odd size. This omission does not decreasethe failure-tolerance of the cluster. In fact, improves it slightly: if thecluster suffers from a network partition that divides it into two equally-sizedhalves then one of the halves will contain a majority of the votingconfiguration and will be able to keep operating. If all of the votes frommaster-eligible nodes were counted, neither side would contain a strict majorityof the nodes and so the cluster would not be able to make any progress.For instance if there are four master-eligible nodes in the cluster and thevoting configuration contained all of them, any quorum-based decision wouldrequire votes from at least three of them. This situation means that the clustercan tolerate the loss of only a single master-eligible node. If this clusterwere split into two equal halves, neither half would contain threemaster-eligible nodes and the cluster would not be able to make any progress.If the voting configuration contains only three of the four master-eligiblenodes, however, the cluster is still only fully tolerant to the loss of onenode, but quorum-based decisions require votes from two of the three votingnodes. In the event of an even split, one half will contain two of the threevoting nodes so that half will remain available.[float]==== Setting the initial voting configurationWhen a brand-new cluster starts up for the first time, it must elect its firstmaster node. To do this election, it needs to know the set of master-eligiblenodes whose votes should count. This initial voting configuration is known asthe _bootstrap configuration_ and is set in the<<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.It is important that the bootstrap configuration identifies exactly which nodesshould vote in the first election. It is not sufficient to configure each nodewith an expectation of how many nodes there should be in the cluster. It is alsoimportant to note that the bootstrap configuration must come from outside thecluster: there is no safe way for the cluster to determine the bootstrapconfiguration correctly on its own.If the bootstrap configuration is not set correctly, when you start a brand-newcluster there is a risk that you will accidentally form two separate clustersinstead of one. This situation can lead to data loss: you might start using bothclusters before you notice that anything has gone wrong and it is impossible tomerge them together later.NOTE: To illustrate the problem with configuring each node to expect a certaincluster size, imagine starting up a three-node cluster in which each node knowsthat it is going to be part of a three-node cluster. A majority of three nodesis two, so normally the first two nodes to discover each other form a clusterand the third node joins them a short time later. However, imagine that fournodes were erroneously started instead of three. In this case, there are enoughnodes to form two separate clusters. Of course if each node is started manuallythen it's unlikely that too many nodes are started. If you're using an automatedorchestrator, however, it's certainly possible to get into this situation--particularly if the orchestrator is not resilient to failures such as networkpartitions.The initial quorum is only required the very first time a whole cluster startsup. New nodes joining an established cluster can safely obtain all theinformation they need from the elected master. Nodes that have previously beenpart of a cluster will have stored to disk all the information that is requiredwhen they restart.
 |