Browse Source

More improvements to cluster coordination docs (#42799)

This commit addresses a few more frequently-asked questions:

* clarifies that bootstrapping doesn't happen even after a full cluster
  restart.

* removes the example that uses IP addresses, to try and further encourage the
  use of node names for bootstrapping.

* clarifies that auto-bootstrapping might form different clusters on different
  hosts, and gives a process for starting again if this wasn't what you wanted.

* adds the "do not stop half-or-more of the master-eligible nodes" slogan that
  was notably absent.

* reformats one of the console examples to a narrower width
David Turner 6 years ago
parent
commit
ec427ff55e

+ 19 - 13
docs/reference/modules/discovery/adding-removing-nodes.asciidoc

@@ -35,30 +35,36 @@ four of the nodes at once: to do so would leave only three nodes remaining,
 which is less than half of the voting configuration, which means the cluster
 cannot take any further actions.
 
+More precisely, if you shut down half or more of the master-eligible nodes all
+at the same time then the cluster will normally become unavailable. If this
+happens then you can bring the cluster back online by starting the removed
+nodes again.
+
 As long as there are at least three master-eligible nodes in the cluster, as a
 general rule it is best to remove nodes one-at-a-time, allowing enough time for
 the cluster to <<modules-discovery-quorums,automatically adjust>> the voting
 configuration and adapt the fault tolerance level to the new set of nodes.
 
 If there are only two master-eligible nodes remaining then neither node can be
-safely removed since both are required to reliably make progress. You must first
-inform Elasticsearch that one of the nodes should not be part of the voting
-configuration, and that the voting power should instead be given to other nodes.
-You can then take the excluded node offline without preventing the other node
-from making progress. A node which is added to a voting configuration exclusion
-list still works normally, but Elasticsearch tries to remove it from the voting
-configuration so its vote is no longer required.  Importantly, Elasticsearch
-will never automatically move a node on the voting exclusions list back into the
-voting configuration. Once an excluded node has been successfully
+safely removed since both are required to reliably make progress. To remove one
+of these nodes you must first inform {es} that it should not be part of the
+voting configuration, and that the voting power should instead be given to the
+other node. You can then take the excluded node offline without preventing the
+other node from making progress. A node which is added to a voting
+configuration exclusion list still works normally, but {es} tries to remove it
+from the voting configuration so its vote is no longer required. Importantly,
+{es} will never automatically move a node on the voting exclusions list back
+into the voting configuration. Once an excluded node has been successfully
 auto-reconfigured out of the voting configuration, it is safe to shut it down
 without affecting the cluster's master-level availability. A node can be added
-to the voting configuration exclusion list using the <<voting-config-exclusions>> API. For example:
+to the voting configuration exclusion list using the
+<<voting-config-exclusions>> API. For example:
 
 [source,js]
 --------------------------------------------------
-# Add node to voting configuration exclusions list and wait for the system to
-# auto-reconfigure the node out of the voting configuration up to the default
-# timeout of 30 seconds
+# Add node to voting configuration exclusions list and wait for the system
+# to auto-reconfigure the node out of the voting configuration up to the
+# default timeout of 30 seconds
 POST /_cluster/voting_config_exclusions/node_name
 
 # Add node to voting configuration exclusions list and wait for

+ 29 - 15
docs/reference/modules/discovery/bootstrapping.asciidoc

@@ -6,8 +6,9 @@ set of <<master-node,master-eligible nodes>> to be explicitly defined on one or
 more of the master-eligible nodes in the cluster. This is known as _cluster
 bootstrapping_.  This is only required the very first time the cluster starts
 up: nodes that have already joined a cluster store this information in their
-data folder and freshly-started nodes that are joining an existing cluster
-obtain this information from the cluster's elected master. 
+data folder for use in a <<restart-upgrade,full cluster restart>>, and
+freshly-started nodes that are joining a running cluster obtain this
+information from the cluster's elected master.
 
 The initial set of master-eligible nodes is defined in the
 <<initial_master_nodes,`cluster.initial_master_nodes` setting>>. This should be
@@ -58,19 +59,6 @@ cluster.initial_master_nodes:
   - master-c
 --------------------------------------------------
 
-If it is not possible to use the names of the nodes then you can also use IP
-addresses, or IP addresses and ports, or even a mix of IP addresses and node
-names:
-
-[source,yaml]
---------------------------------------------------
-cluster.initial_master_nodes:
-  - 10.0.10.101
-  - 10.0.10.102:9300
-  - 10.0.10.102:9301
-  - master-node-name
---------------------------------------------------
-
 Like all node settings, it is also possible to specify the initial set of master
 nodes on the command-line that is used to start Elasticsearch:
 
@@ -139,3 +127,29 @@ in the <<modules-discovery-bootstrap-cluster,section on cluster bootstrapping>>:
 * `discovery.seed_providers`
 * `discovery.seed_hosts`
 * `cluster.initial_master_nodes`
+
+[NOTE]
+==================================================
+
+[[modules-discovery-bootstrap-cluster-joining]] If you start an {es} node
+without configuring these settings then it will start up in development mode and
+auto-bootstrap itself into a new cluster. If you start some {es} nodes on
+different hosts then by default they will not discover each other and will form
+a different cluster on each host. {es} will not merge separate clusters together
+after they have formed, even if you subsequently try and configure all the nodes
+into a single cluster. This is because there is no way to merge these separate
+clusters together without a risk of data loss. You can tell that you have formed
+separate clusters by checking the cluster UUID reported by `GET /` on each node.
+If you intended to form a single cluster then you should start again:
+
+* Take a <<modules-snapshots,snapshot>> of each of the single-host clusters if
+  you do not want to lose any data that they hold. Note that each cluster must
+  use its own snapshot repository.
+* Shut down all the nodes.
+* Completely wipe each node by deleting the contents of their
+  <<data-path,data folders>>.
+* Configure `cluster.initial_master_nodes` as described above.
+* Restart all the nodes and verify that they have formed a single cluster.
+* <<modules-snapshots,Restore>> any snapshots as required.
+
+==================================================