add-nodes.asciidoc 9.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181
  1. [[add-elasticsearch-nodes]]
  2. == Add and remove nodes in your cluster
  3. When you start an instance of {es}, you are starting a _node_. An {es} _cluster_
  4. is a group of nodes that have the same `cluster.name` attribute. As nodes join
  5. or leave a cluster, the cluster automatically reorganizes itself to evenly
  6. distribute the data across the available nodes.
  7. If you are running a single instance of {es}, you have a cluster of one node.
  8. All primary shards reside on the single node. No replica shards can be
  9. allocated, therefore the cluster state remains yellow. The cluster is fully
  10. functional but is at risk of data loss in the event of a failure.
  11. image::setup/images/elas_0202.png["A cluster with one node and three primary shards"]
  12. You add nodes to a cluster to increase its capacity and reliability. By default,
  13. a node is both a data node and eligible to be elected as the master node that
  14. controls the cluster. You can also configure a new node for a specific purpose,
  15. such as handling ingest requests. For more information, see
  16. <<modules-node,Nodes>>.
  17. When you add more nodes to a cluster, it automatically allocates replica shards.
  18. When all primary and replica shards are active, the cluster state changes to
  19. green.
  20. image::setup/images/elas_0204.png["A cluster with three nodes"]
  21. You can run multiple nodes on your local machine in order to experiment with how
  22. an {es} cluster of multiple nodes behaves. To add a node to a cluster running on
  23. your local machine:
  24. . Set up a new {es} instance.
  25. . Specify the name of the cluster with the `cluster.name` setting in
  26. `elasticsearch.yml`. For example, to add a node to the `logging-prod` cluster,
  27. add the line `cluster.name: "logging-prod"` to `elasticsearch.yml`.
  28. . Start {es}. The node automatically discovers and joins the specified cluster.
  29. To add a node to a cluster running on multiple machines, you must also
  30. <<unicast.hosts,set `discovery.seed_hosts`>> so that the new node can discover
  31. the rest of its cluster.
  32. For more information about discovery and shard allocation, see
  33. <<modules-discovery>> and <<modules-cluster>>.
  34. [discrete]
  35. [[add-elasticsearch-nodes-master-eligible]]
  36. === Master-eligible nodes
  37. As nodes are added or removed Elasticsearch maintains an optimal level of fault
  38. tolerance by automatically updating the cluster's _voting configuration_, which
  39. is the set of <<master-node,master-eligible nodes>> whose responses are counted
  40. when making decisions such as electing a new master or committing a new cluster
  41. state.
  42. It is recommended to have a small and fixed number of master-eligible nodes in a
  43. cluster, and to scale the cluster up and down by adding and removing
  44. master-ineligible nodes only. However there are situations in which it may be
  45. desirable to add or remove some master-eligible nodes to or from a cluster.
  46. [discrete]
  47. [[modules-discovery-adding-nodes]]
  48. ==== Adding master-eligible nodes
  49. If you wish to add some nodes to your cluster, simply configure the new nodes
  50. to find the existing cluster and start them up. Elasticsearch adds the new nodes
  51. to the voting configuration if it is appropriate to do so.
  52. During master election or when joining an existing formed cluster, a node
  53. sends a join request to the master in order to be officially added to the
  54. cluster. You can use the `cluster.join.timeout` setting to configure how long a
  55. node waits after sending a request to join a cluster. Its default value is `30s`.
  56. See <<modules-discovery-settings>>.
  57. [discrete]
  58. [[modules-discovery-removing-nodes]]
  59. ==== Removing master-eligible nodes
  60. When removing master-eligible nodes, it is important not to remove too many all
  61. at the same time. For instance, if there are currently seven master-eligible
  62. nodes and you wish to reduce this to three, it is not possible simply to stop
  63. four of the nodes at once: to do so would leave only three nodes remaining,
  64. which is less than half of the voting configuration, which means the cluster
  65. cannot take any further actions.
  66. More precisely, if you shut down half or more of the master-eligible nodes all
  67. at the same time then the cluster will normally become unavailable. If this
  68. happens then you can bring the cluster back online by starting the removed
  69. nodes again.
  70. As long as there are at least three master-eligible nodes in the cluster, as a
  71. general rule it is best to remove nodes one-at-a-time, allowing enough time for
  72. the cluster to <<modules-discovery-quorums,automatically adjust>> the voting
  73. configuration and adapt the fault tolerance level to the new set of nodes.
  74. If there are only two master-eligible nodes remaining then neither node can be
  75. safely removed since both are required to reliably make progress. To remove one
  76. of these nodes you must first inform {es} that it should not be part of the
  77. voting configuration, and that the voting power should instead be given to the
  78. other node. You can then take the excluded node offline without preventing the
  79. other node from making progress. A node which is added to a voting
  80. configuration exclusion list still works normally, but {es} tries to remove it
  81. from the voting configuration so its vote is no longer required. Importantly,
  82. {es} will never automatically move a node on the voting exclusions list back
  83. into the voting configuration. Once an excluded node has been successfully
  84. auto-reconfigured out of the voting configuration, it is safe to shut it down
  85. without affecting the cluster's master-level availability. A node can be added
  86. to the voting configuration exclusion list using the
  87. <<voting-config-exclusions>> API. For example:
  88. [source,console]
  89. --------------------------------------------------
  90. # Add node to voting configuration exclusions list and wait for the system
  91. # to auto-reconfigure the node out of the voting configuration up to the
  92. # default timeout of 30 seconds
  93. POST /_cluster/voting_config_exclusions?node_names=node_name
  94. # Add node to voting configuration exclusions list and wait for
  95. # auto-reconfiguration up to one minute
  96. POST /_cluster/voting_config_exclusions?node_names=node_name&timeout=1m
  97. --------------------------------------------------
  98. // TEST[skip:this would break the test cluster if executed]
  99. The nodes that should be added to the exclusions list are specified by name
  100. using the `?node_names` query parameter, or by their persistent node IDs using
  101. the `?node_ids` query parameter. If a call to the voting configuration
  102. exclusions API fails, you can safely retry it. Only a successful response
  103. guarantees that the node has actually been removed from the voting configuration
  104. and will not be reinstated. If the elected master node is excluded from the
  105. voting configuration then it will abdicate to another master-eligible node that
  106. is still in the voting configuration if such a node is available.
  107. Although the voting configuration exclusions API is most useful for down-scaling
  108. a two-node to a one-node cluster, it is also possible to use it to remove
  109. multiple master-eligible nodes all at the same time. Adding multiple nodes to
  110. the exclusions list has the system try to auto-reconfigure all of these nodes
  111. out of the voting configuration, allowing them to be safely shut down while
  112. keeping the cluster available. In the example described above, shrinking a
  113. seven-master-node cluster down to only have three master nodes, you could add
  114. four nodes to the exclusions list, wait for confirmation, and then shut them
  115. down simultaneously.
  116. NOTE: Voting exclusions are only required when removing at least half of the
  117. master-eligible nodes from a cluster in a short time period. They are not
  118. required when removing master-ineligible nodes, nor are they required when
  119. removing fewer than half of the master-eligible nodes.
  120. Adding an exclusion for a node creates an entry for that node in the voting
  121. configuration exclusions list, which has the system automatically try to
  122. reconfigure the voting configuration to remove that node and prevents it from
  123. returning to the voting configuration once it has removed. The current list of
  124. exclusions is stored in the cluster state and can be inspected as follows:
  125. [source,console]
  126. --------------------------------------------------
  127. GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions
  128. --------------------------------------------------
  129. This list is limited in size by the `cluster.max_voting_config_exclusions`
  130. setting, which defaults to `10`. See <<modules-discovery-settings>>. Since
  131. voting configuration exclusions are persistent and limited in number, they must
  132. be cleaned up. Normally an exclusion is added when performing some maintenance
  133. on the cluster, and the exclusions should be cleaned up when the maintenance is
  134. complete. Clusters should have no voting configuration exclusions in normal
  135. operation.
  136. If a node is excluded from the voting configuration because it is to be shut
  137. down permanently, its exclusion can be removed after it is shut down and removed
  138. from the cluster. Exclusions can also be cleared if they were created in error
  139. or were only required temporarily by specifying `?wait_for_removal=false`.
  140. [source,console]
  141. --------------------------------------------------
  142. # Wait for all the nodes with voting configuration exclusions to be removed from
  143. # the cluster and then remove all the exclusions, allowing any node to return to
  144. # the voting configuration in the future.
  145. DELETE /_cluster/voting_config_exclusions
  146. # Immediately remove all the voting configuration exclusions, allowing any node
  147. # to return to the voting configuration in the future.
  148. DELETE /_cluster/voting_config_exclusions?wait_for_removal=false
  149. --------------------------------------------------