voting.asciidoc 7.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
  1. [[modules-discovery-voting]]
  2. === Voting configurations
  3. Each {es} cluster has a _voting configuration_, which is the set of
  4. <<master-node,master-eligible nodes>> whose responses are counted when making
  5. decisions such as electing a new master or committing a new cluster state.
  6. Decisions are made only after a majority (more than half) of the nodes in the
  7. voting configuration respond.
  8. Usually the voting configuration is the same as the set of all the
  9. master-eligible nodes that are currently in the cluster. However, there are some
  10. situations in which they may be different.
  11. include::quorums.asciidoc[tag=quorums-and-availability]
  12. After a node joins or leaves the cluster, {es} reacts by automatically making
  13. corresponding changes to the voting configuration in order to ensure that the
  14. cluster is as resilient as possible. It is important to wait for this adjustment
  15. to complete before you remove more nodes from the cluster. For more information,
  16. see <<add-elasticsearch-nodes>>.
  17. The current voting configuration is stored in the cluster state so you can
  18. inspect its current contents as follows:
  19. [source,console]
  20. --------------------------------------------------
  21. GET /_cluster/state?filter_path=metadata.cluster_coordination.last_committed_config
  22. --------------------------------------------------
  23. NOTE: The current voting configuration is not necessarily the same as the set of
  24. all available master-eligible nodes in the cluster. Altering the voting
  25. configuration involves taking a vote, so it takes some time to adjust the
  26. configuration as nodes join or leave the cluster. Also, there are situations
  27. where the most resilient configuration includes unavailable nodes or does not
  28. include some available nodes. In these situations, the voting configuration
  29. differs from the set of available master-eligible nodes in the cluster.
  30. Larger voting configurations are usually more resilient, so Elasticsearch
  31. normally prefers to add master-eligible nodes to the voting configuration after
  32. they join the cluster. Similarly, if a node in the voting configuration
  33. leaves the cluster and there is another master-eligible node in the cluster that
  34. is not in the voting configuration then it is preferable to swap these two nodes
  35. over. The size of the voting configuration is thus unchanged but its
  36. resilience increases.
  37. It is not so straightforward to automatically remove nodes from the voting
  38. configuration after they have left the cluster. Different strategies have
  39. different benefits and drawbacks, so the right choice depends on how the cluster
  40. will be used. You can control whether the voting configuration automatically
  41. shrinks by using the
  42. <<modules-discovery-settings,`cluster.auto_shrink_voting_configuration` setting>>.
  43. NOTE: If `cluster.auto_shrink_voting_configuration` is set to `true` (which is
  44. the default and recommended value) and there are at least three master-eligible
  45. nodes in the cluster, Elasticsearch remains capable of processing cluster state
  46. updates as long as all but one of its master-eligible nodes are healthy.
  47. There are situations in which Elasticsearch might tolerate the loss of multiple
  48. nodes, but this is not guaranteed under all sequences of failures. If the
  49. `cluster.auto_shrink_voting_configuration` setting is `false`, you must remove
  50. departed nodes from the voting configuration manually. Use the
  51. <<voting-config-exclusions,voting exclusions API>> to achieve the desired level
  52. of resilience.
  53. No matter how it is configured, Elasticsearch will not suffer from a
  54. "{wikipedia}/Split-brain_(computing)[split-brain]" inconsistency.
  55. The `cluster.auto_shrink_voting_configuration`
  56. setting affects only its availability in the event of the failure of some of its
  57. nodes and the administrative tasks that must be performed as nodes join and
  58. leave the cluster.
  59. [discrete]
  60. ==== Even numbers of master-eligible nodes
  61. There should normally be an odd number of master-eligible nodes in a cluster.
  62. If there is an even number, Elasticsearch leaves one of them out of the voting
  63. configuration to ensure that it has an odd size. This omission does not decrease
  64. the failure-tolerance of the cluster. In fact, improves it slightly: if the
  65. cluster suffers from a network partition that divides it into two equally-sized
  66. halves then one of the halves will contain a majority of the voting
  67. configuration and will be able to keep operating. If all of the votes from
  68. master-eligible nodes were counted, neither side would contain a strict majority
  69. of the nodes and so the cluster would not be able to make any progress.
  70. For instance if there are four master-eligible nodes in the cluster and the
  71. voting configuration contained all of them, any quorum-based decision would
  72. require votes from at least three of them. This situation means that the cluster
  73. can tolerate the loss of only a single master-eligible node. If this cluster
  74. were split into two equal halves, neither half would contain three
  75. master-eligible nodes and the cluster would not be able to make any progress.
  76. If the voting configuration contains only three of the four master-eligible
  77. nodes, however, the cluster is still only fully tolerant to the loss of one
  78. node, but quorum-based decisions require votes from two of the three voting
  79. nodes. In the event of an even split, one half will contain two of the three
  80. voting nodes so that half will remain available.
  81. [discrete]
  82. ==== Setting the initial voting configuration
  83. When a brand-new cluster starts up for the first time, it must elect its first
  84. master node. To do this election, it needs to know the set of master-eligible
  85. nodes whose votes should count. This initial voting configuration is known as
  86. the _bootstrap configuration_ and is set in the
  87. <<modules-discovery-bootstrap-cluster,cluster bootstrapping process>>.
  88. It is important that the bootstrap configuration identifies exactly which nodes
  89. should vote in the first election. It is not sufficient to configure each node
  90. with an expectation of how many nodes there should be in the cluster. It is also
  91. important to note that the bootstrap configuration must come from outside the
  92. cluster: there is no safe way for the cluster to determine the bootstrap
  93. configuration correctly on its own.
  94. If the bootstrap configuration is not set correctly, when you start a brand-new
  95. cluster there is a risk that you will accidentally form two separate clusters
  96. instead of one. This situation can lead to data loss: you might start using both
  97. clusters before you notice that anything has gone wrong and it is impossible to
  98. merge them together later.
  99. NOTE: To illustrate the problem with configuring each node to expect a certain
  100. cluster size, imagine starting up a three-node cluster in which each node knows
  101. that it is going to be part of a three-node cluster. A majority of three nodes
  102. is two, so normally the first two nodes to discover each other form a cluster
  103. and the third node joins them a short time later. However, imagine that four
  104. nodes were erroneously started instead of three. In this case, there are enough
  105. nodes to form two separate clusters. Of course if each node is started manually
  106. then it's unlikely that too many nodes are started. If you're using an automated
  107. orchestrator, however, it's certainly possible to get into this situation--
  108. particularly if the orchestrator is not resilient to failures such as network
  109. partitions.
  110. The initial quorum is only required the very first time a whole cluster starts
  111. up. New nodes joining an established cluster can safely obtain all the
  112. information they need from the elected master. Nodes that have previously been
  113. part of a cluster will have stored to disk all the information that is required
  114. when they restart.