| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566 | [[modules-discovery-quorums]]=== Quorum-based decision makingElecting a master node and changing the cluster state are the two fundamentaltasks that master-eligible nodes must work together to perform. It is importantthat these activities work robustly even if some nodes have failed.Elasticsearch achieves this robustness by considering each action to havesucceeded on receipt of responses from a _quorum_, which is a subset of themaster-eligible nodes in the cluster. The advantage of requiring only a subsetof the nodes to respond is that it means some of the nodes can fail withoutpreventing the cluster from making progress. The quorums are carefully chosen sothe cluster does not have a "split brain" scenario where it's partitioned intotwo pieces such that each piece may make decisions that are inconsistent withthose of the other piece.Elasticsearch allows you to add and remove master-eligible nodes to a runningcluster. In many cases you can do this simply by starting or stopping the nodesas required. See <<modules-discovery-adding-removing-nodes>>.As nodes are added or removed Elasticsearch maintains an optimal level of faulttolerance by updating the cluster's <<modules-discovery-voting,votingconfiguration>>, which is the set of master-eligible nodes whose responses arecounted when making decisions such as electing a new master or committing a newcluster state. A decision is made only after more than half of the nodes in thevoting configuration have responded.  Usually the voting configuration is thesame as the set of all the master-eligible nodes that are currently in thecluster. However, there are some situations in which they may be different.To be sure that the cluster remains available you **must not stop half or moreof the nodes in the voting configuration at the same time**. As long as morethan half of the voting nodes are available the cluster can still work normally.This means that if there are three or four master-eligible nodes, the clustercan tolerate one of them being unavailable. If there are two or fewermaster-eligible nodes, they must all remain available.After a node has joined or left the cluster the elected master must issue acluster-state update that adjusts the voting configuration to match, and thiscan take a short time to complete. It is important to wait for this adjustmentto complete before removing more nodes from the cluster.[float]==== Master electionsElasticsearch uses an election process to agree on an elected master node, bothat startup and if the existing elected master fails. Any master-eligible nodecan start an election, and normally the first election that takes place willsucceed. Elections only usually fail when two nodes both happen to start theirelections at about the same time, so elections are scheduled randomly on eachnode to reduce the probability of this happening. Nodes will retry electionsuntil a master is elected, backing off on failure, so that eventually anelection will succeed (with arbitrarily high probability). The scheduling ofmaster elections are controlled by the <<master-election-settings,masterelection settings>>.[float]==== Cluster maintenance, rolling restarts and migrationsMany cluster maintenance tasks involve temporarily shutting down one or morenodes and then starting them back up again. By default Elasticsearch can remainavailable if one of its master-eligible nodes is taken offline, such as during a<<rolling-upgrades,rolling restart>>. Furthermore, if multiple nodes are stoppedand then started again then it will automatically recover, such as during a<<restart-upgrade,full cluster restart>>. There is no need to take any furtheraction with the APIs described here in these cases, because the set of masternodes is not changing permanently.
 |