voting.asciidoc 7.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
  1. [[modules-discovery-voting]]
  2. === Voting configurations
  3. Each {es} cluster has a _voting configuration_, which is the set of
  4. <<master-node,master-eligible nodes>> whose responses are counted when making
  5. decisions such as electing a new master or committing a new cluster state.
  6. Decisions are made only after a majority (more than half) of the nodes in the
  7. voting configuration respond.
  8. Usually the voting configuration is the same as the set of all the
  9. master-eligible nodes that are currently in the cluster. However, there are some
  10. situations in which they may be different.
  11. IMPORTANT: To ensure the cluster remains available, you **must not stop half or
  12. more of the nodes in the voting configuration at the same time**. As long as more
  13. than half of the voting nodes are available, the cluster can work normally. For
  14. example, if there are three or four master-eligible nodes, the cluster
  15. can tolerate one unavailable node. If there are two or fewer master-eligible
  16. nodes, they must all remain available.
  17. After a node joins or leaves the cluster, {es} reacts by automatically making
  18. corresponding changes to the voting configuration in order to ensure that the
  19. cluster is as resilient as possible. It is important to wait for this adjustment
  20. to complete before you remove more nodes from the cluster. For more information,
  21. see <<modules-discovery-adding-removing-nodes>>.
  22. The current voting configuration is stored in the cluster state so you can
  23. inspect its current contents as follows:
  24. [source,console]
  25. --------------------------------------------------
  26. GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config
  27. --------------------------------------------------
  28. NOTE: The current voting configuration is not necessarily the same as the set of
  29. all available master-eligible nodes in the cluster. Altering the voting
  30. configuration involves taking a vote, so it takes some time to adjust the
  31. configuration as nodes join or leave the cluster. Also, there are situations
  32. where the most resilient configuration includes unavailable nodes or does not
  33. include some available nodes. In these situations, the voting configuration
  34. differs from the set of available master-eligible nodes in the cluster.
  35. Larger voting configurations are usually more resilient, so Elasticsearch
  36. normally prefers to add master-eligible nodes to the voting configuration after
  37. they join the cluster. Similarly, if a node in the voting configuration
  38. leaves the cluster and there is another master-eligible node in the cluster that
  39. is not in the voting configuration then it is preferable to swap these two nodes
  40. over. The size of the voting configuration is thus unchanged but its
  41. resilience increases.
  42. It is not so straightforward to automatically remove nodes from the voting
  43. configuration after they have left the cluster. Different strategies have
  44. different benefits and drawbacks, so the right choice depends on how the cluster
  45. will be used. You can control whether the voting configuration automatically
  46. shrinks by using the
  47. <<modules-discovery-settings,`cluster.auto_shrink_voting_configuration` setting>>.
  48. NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true` (which is
  49. the default and recommended value) and there are at least three master-eligible
  50. nodes in the cluster, Elasticsearch remains capable of processing cluster state
  51. updates as long as all but one of its master-eligible nodes are healthy.
  52. There are situations in which Elasticsearch might tolerate the loss of multiple
  53. nodes, but this is not guaranteed under all sequences of failures. If the
  54. `cluster.auto_shrink_voting_configuration` setting is `false`, you must remove
  55. departed nodes from the voting configuration manually. Use the
  56. <<voting-config-exclusions,voting exclusions API>> to achieve the desired level
  57. of resilience.
  58. No matter how it is configured, Elasticsearch will not suffer from a
  59. "split-brain" inconsistency. The `cluster.auto_shrink_voting_configuration`
  60. setting affects only its availability in the event of the failure of some of its
  61. nodes and the administrative tasks that must be performed as nodes join and
  62. leave the cluster.
  63. [float]
  64. ==== Even numbers of master-eligible nodes
  65. There should normally be an odd number of master-eligible nodes in a cluster.
  66. If there is an even number, Elasticsearch leaves one of them out of the voting
  67. configuration to ensure that it has an odd size. This omission does not decrease
  68. the failure-tolerance of the cluster. In fact, improves it slightly: if the
  69. cluster suffers from a network partition that divides it into two equally-sized
  70. halves then one of the halves will contain a majority of the voting
  71. configuration and will be able to keep operating. If all of the votes from
  72. master-eligible nodes were counted, neither side would contain a strict majority
  73. of the nodes and so the cluster would not be able to make any progress.
  74. For instance if there are four master-eligible nodes in the cluster and the
  75. voting configuration contained all of them, any quorum-based decision would
  76. require votes from at least three of them. This situation means that the cluster
  77. can tolerate the loss of only a single master-eligible node. If this cluster
  78. were split into two equal halves, neither half would contain three
  79. master-eligible nodes and the cluster would not be able to make any progress.
  80. If the voting configuration contains only three of the four master-eligible
  81. nodes, however, the cluster is still only fully tolerant to the loss of one
  82. node, but quorum-based decisions require votes from two of the three voting
  83. nodes. In the event of an even split, one half will contain two of the three
  84. voting nodes so that half will remain available.
  85. [float]
  86. ==== Setting the initial voting configuration
  87. When a brand-new cluster starts up for the first time, it must elect its first
  88. master node. To do this election, it needs to know the set of master-eligible
  89. nodes whose votes should count. This initial voting configuration is known as
  90. the _bootstrap configuration_ and is set in the
  91. <<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.
  92. It is important that the bootstrap configuration identifies exactly which nodes
  93. should vote in the first election. It is not sufficient to configure each node
  94. with an expectation of how many nodes there should be in the cluster. It is also
  95. important to note that the bootstrap configuration must come from outside the
  96. cluster: there is no safe way for the cluster to determine the bootstrap
  97. configuration correctly on its own.
  98. If the bootstrap configuration is not set correctly, when you start a brand-new
  99. cluster there is a risk that you will accidentally form two separate clusters
  100. instead of one. This situation can lead to data loss: you might start using both
  101. clusters before you notice that anything has gone wrong and it is impossible to
  102. merge them together later.
  103. NOTE: To illustrate the problem with configuring each node to expect a certain
  104. cluster size, imagine starting up a three-node cluster in which each node knows
  105. that it is going to be part of a three-node cluster. A majority of three nodes
  106. is two, so normally the first two nodes to discover each other form a cluster
  107. and the third node joins them a short time later. However, imagine that four
  108. nodes were erroneously started instead of three. In this case, there are enough
  109. nodes to form two separate clusters. Of course if each node is started manually
  110. then it's unlikely that too many nodes are started. If you're using an automated
  111. orchestrator, however, it's certainly possible to get into this situation--
  112. particularly if the orchestrator is not resilient to failures such as network
  113. partitions.
  114. The initial quorum is only required the very first time a whole cluster starts
  115. up. New nodes joining an established cluster can safely obtain all the
  116. information they need from the elected master. Nodes that have previously been
  117. part of a cluster will have stored to disk all the information that is required
  118. when they restart.