quorums.asciidoc 4.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
  1. [[modules-discovery-quorums]]
  2. === Quorum-based decision making
  3. Electing a master node and changing the cluster state are the two fundamental
  4. tasks that master-eligible nodes must work together to perform. It is important
  5. that these activities work robustly even if some nodes have failed.
  6. Elasticsearch achieves this robustness by considering each action to have
  7. succeeded on receipt of responses from a _quorum_, which is a subset of the
  8. master-eligible nodes in the cluster. The advantage of requiring only a subset
  9. of the nodes to respond is that it means some of the nodes can fail without
  10. preventing the cluster from making progress. The quorums are carefully chosen so
  11. the cluster does not have a "split brain" scenario where it's partitioned into
  12. two pieces such that each piece may make decisions that are inconsistent with
  13. those of the other piece.
  14. Elasticsearch allows you to add and remove master-eligible nodes to a running
  15. cluster. In many cases you can do this simply by starting or stopping the nodes
  16. as required. See <<add-elasticsearch-nodes>> for more information.
  17. As nodes are added or removed Elasticsearch maintains an optimal level of fault
  18. tolerance by updating the cluster's <<modules-discovery-voting,voting
  19. configuration>>, which is the set of master-eligible nodes whose responses are
  20. counted when making decisions such as electing a new master or committing a new
  21. cluster state. A decision is made only after more than half of the nodes in the
  22. voting configuration have responded. Usually the voting configuration is the
  23. same as the set of all the master-eligible nodes that are currently in the
  24. cluster. However, there are some situations in which they may be different.
  25. // tag::quorums-and-availability[]
  26. [IMPORTANT]
  27. ====
  28. To be sure that the cluster remains available you **must not stop half or more
  29. of the nodes in the voting configuration at the same time**. As long as more
  30. than half of the voting nodes are available the cluster can still work normally.
  31. This means that if there are three or four master-eligible nodes, the cluster
  32. can tolerate one of them being unavailable. If there are two or fewer
  33. master-eligible nodes, they must all remain available.
  34. If you stop half or more of the nodes in the voting configuration at the same
  35. time then the cluster will be unavailable until you bring enough nodes back
  36. online to form a quorum again. While the cluster is unavailable, any remaining
  37. nodes will report in their logs that they cannot discover or elect a master
  38. node. See <<discovery-troubleshooting>> for more information.
  39. ====
  40. // end::quorums-and-availability[]
  41. After a master-eligible node has joined or left the cluster the elected master
  42. may issue a cluster-state update that adjusts the voting configuration to match,
  43. and this can take a short time to complete. It is important to wait for this
  44. adjustment to complete before removing more nodes from the cluster. See
  45. <<modules-discovery-removing-nodes>> for more information.
  46. [discrete]
  47. ==== Master elections
  48. Elasticsearch uses an election process to agree on an elected master node, both
  49. at startup and if the existing elected master fails. Any master-eligible node
  50. can start an election, and normally the first election that takes place will
  51. succeed. Elections only usually fail when two nodes both happen to start their
  52. elections at about the same time, so elections are scheduled randomly on each
  53. node to reduce the probability of this happening. Nodes will retry elections
  54. until a master is elected, backing off on failure, so that eventually an
  55. election will succeed (with arbitrarily high probability). The scheduling of
  56. master elections are controlled by the <<master-election-settings,master
  57. election settings>>.
  58. [discrete]
  59. ==== Cluster maintenance, rolling restarts and migrations
  60. Many cluster maintenance tasks involve temporarily shutting down one or more
  61. nodes and then starting them back up again. By default {es} can remain
  62. available if one of its master-eligible nodes is taken offline, such as during a
  63. rolling upgrade. Furthermore, if multiple nodes are stopped
  64. and then started again then it will automatically recover, such as during a
  65. full cluster restart. There is no need to take any further
  66. action with the APIs described here in these cases, because the set of master
  67. nodes is not changing permanently.