voting.asciidoc 7.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
  1. [[modules-discovery-voting]]
  2. === Voting configurations
  3. Each {es} cluster has a _voting configuration_, which is the set of
  4. <<master-node,master-eligible nodes>> whose responses are counted when making
  5. decisions such as electing a new master or committing a new cluster state.
  6. Decisions are made only after a majority (more than half) of the nodes in the
  7. voting configuration respond.
  8. Usually the voting configuration is the same as the set of all the
  9. master-eligible nodes that are currently in the cluster. However, there are some
  10. situations in which they may be different.
  11. IMPORTANT: To ensure the cluster remains available, you **must not stop half or
  12. more of the nodes in the voting configuration at the same time**. As long as more
  13. than half of the voting nodes are available, the cluster can work normally. For
  14. example, if there are three or four master-eligible nodes, the cluster
  15. can tolerate one unavailable node. If there are two or fewer master-eligible
  16. nodes, they must all remain available.
  17. After a node joins or leaves the cluster, {es} reacts by automatically making
  18. corresponding changes to the voting configuration in order to ensure that the
  19. cluster is as resilient as possible. It is important to wait for this adjustment
  20. to complete before you remove more nodes from the cluster. For more information,
  21. see <<modules-discovery-adding-removing-nodes>>.
  22. The current voting configuration is stored in the cluster state so you can
  23. inspect its current contents as follows:
  24. [source,js]
  25. --------------------------------------------------
  26. GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config
  27. --------------------------------------------------
  28. // CONSOLE
  29. NOTE: The current voting configuration is not necessarily the same as the set of
  30. all available master-eligible nodes in the cluster. Altering the voting
  31. configuration involves taking a vote, so it takes some time to adjust the
  32. configuration as nodes join or leave the cluster. Also, there are situations
  33. where the most resilient configuration includes unavailable nodes or does not
  34. include some available nodes. In these situations, the voting configuration
  35. differs from the set of available master-eligible nodes in the cluster.
  36. Larger voting configurations are usually more resilient, so Elasticsearch
  37. normally prefers to add master-eligible nodes to the voting configuration after
  38. they join the cluster. Similarly, if a node in the voting configuration
  39. leaves the cluster and there is another master-eligible node in the cluster that
  40. is not in the voting configuration then it is preferable to swap these two nodes
  41. over. The size of the voting configuration is thus unchanged but its
  42. resilience increases.
  43. It is not so straightforward to automatically remove nodes from the voting
  44. configuration after they have left the cluster. Different strategies have
  45. different benefits and drawbacks, so the right choice depends on how the cluster
  46. will be used. You can control whether the voting configuration automatically
  47. shrinks by using the
  48. <<modules-discovery-settings,`cluster.auto_shrink_voting_configuration` setting>>.
  49. NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true` (which is
  50. the default and recommended value) and there are at least three master-eligible
  51. nodes in the cluster, Elasticsearch remains capable of processing cluster state
  52. updates as long as all but one of its master-eligible nodes are healthy.
  53. There are situations in which Elasticsearch might tolerate the loss of multiple
  54. nodes, but this is not guaranteed under all sequences of failures. If the
  55. `cluster.auto_shrink_voting_configuration` setting is `false`, you must remove
  56. departed nodes from the voting configuration manually. Use the
  57. <<voting-config-exclusions,voting exclusions API>> to achieve the desired level
  58. of resilience.
  59. No matter how it is configured, Elasticsearch will not suffer from a
  60. "split-brain" inconsistency. The `cluster.auto_shrink_voting_configuration`
  61. setting affects only its availability in the event of the failure of some of its
  62. nodes and the administrative tasks that must be performed as nodes join and
  63. leave the cluster.
  64. [float]
  65. ==== Even numbers of master-eligible nodes
  66. There should normally be an odd number of master-eligible nodes in a cluster.
  67. If there is an even number, Elasticsearch leaves one of them out of the voting
  68. configuration to ensure that it has an odd size. This omission does not decrease
  69. the failure-tolerance of the cluster. In fact, improves it slightly: if the
  70. cluster suffers from a network partition that divides it into two equally-sized
  71. halves then one of the halves will contain a majority of the voting
  72. configuration and will be able to keep operating. If all of the votes from
  73. master-eligible nodes were counted, neither side would contain a strict majority
  74. of the nodes and so the cluster would not be able to make any progress.
  75. For instance if there are four master-eligible nodes in the cluster and the
  76. voting configuration contained all of them, any quorum-based decision would
  77. require votes from at least three of them. This situation means that the cluster
  78. can tolerate the loss of only a single master-eligible node. If this cluster
  79. were split into two equal halves, neither half would contain three
  80. master-eligible nodes and the cluster would not be able to make any progress.
  81. If the voting configuration contains only three of the four master-eligible
  82. nodes, however, the cluster is still only fully tolerant to the loss of one
  83. node, but quorum-based decisions require votes from two of the three voting
  84. nodes. In the event of an even split, one half will contain two of the three
  85. voting nodes so that half will remain available.
  86. [float]
  87. ==== Setting the initial voting configuration
  88. When a brand-new cluster starts up for the first time, it must elect its first
  89. master node. To do this election, it needs to know the set of master-eligible
  90. nodes whose votes should count. This initial voting configuration is known as
  91. the _bootstrap configuration_ and is set in the
  92. <<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.
  93. It is important that the bootstrap configuration identifies exactly which nodes
  94. should vote in the first election. It is not sufficient to configure each node
  95. with an expectation of how many nodes there should be in the cluster. It is also
  96. important to note that the bootstrap configuration must come from outside the
  97. cluster: there is no safe way for the cluster to determine the bootstrap
  98. configuration correctly on its own.
  99. If the bootstrap configuration is not set correctly, when you start a brand-new
  100. cluster there is a risk that you will accidentally form two separate clusters
  101. instead of one. This situation can lead to data loss: you might start using both
  102. clusters before you notice that anything has gone wrong and it is impossible to
  103. merge them together later.
  104. NOTE: To illustrate the problem with configuring each node to expect a certain
  105. cluster size, imagine starting up a three-node cluster in which each node knows
  106. that it is going to be part of a three-node cluster. A majority of three nodes
  107. is two, so normally the first two nodes to discover each other form a cluster
  108. and the third node joins them a short time later. However, imagine that four
  109. nodes were erroneously started instead of three. In this case, there are enough
  110. nodes to form two separate clusters. Of course if each node is started manually
  111. then it's unlikely that too many nodes are started. If you're using an automated
  112. orchestrator, however, it's certainly possible to get into this situation--
  113. particularly if the orchestrator is not resilient to failures such as network
  114. partitions.
  115. The initial quorum is only required the very first time a whole cluster starts
  116. up. New nodes joining an established cluster can safely obtain all the
  117. information they need from the elected master. Nodes that have previously been
  118. part of a cluster will have stored to disk all the information that is required
  119. when they restart.