Browse Source

added docs

Boaz Leskes 10 years ago
parent
commit
f70ed876d6
2 changed files with 27 additions and 6 deletions
  1. 12 6
      docs/reference/modules/discovery/zen.asciidoc
  2. 15 0
      docs/resiliency/index.asciidoc

+ 12 - 6
docs/reference/modules/discovery/zen.asciidoc

@@ -108,12 +108,18 @@ considered failed. Defaults to `3`.
 The master node is the only node in a cluster that can make changes to the
 cluster state. The master node processes one cluster state update at a time,
 applies the required changes and publishes the updated cluster state to all
-the other nodes in the cluster. Each node receives the publish message,
-updates its own cluster state and replies to the master node, which waits for
-all nodes to respond, up to a timeout, before going ahead processing the next
-updates in the queue. The `discovery.zen.publish_timeout` is set by default
-to 30 seconds and can be changed dynamically through the
-<<cluster-update-settings,cluster update settings api>>
+the other nodes in the cluster. Each node receives the publish message, acknowledges
+it but do *not* yet apply it. If the master does not receive acknowledgement from
+at least `discovery.zen.minimum_master_nodes` nodes within a certain time (controlled by
+the `discovery.zen.commit_timeout` setting and defaults to 30 seconds) the cluster state
+change is rejected.
+
+Once enough nodes have responded, the cluster state is committed and a message will
+be sent to all the nodes. The nodes then proceed and apply the new cluster state to their
+internal state. The master node waits for all nodes to respond, up to a timeout, before
+going ahead processing the next updates in the queue. The `discovery.zen.publish_timeout` is
+set by default to 30 seconds and is measured from the moment the publishing started. Both
+timeout settings can be changed dynamically through the <<cluster-update-settings,cluster update settings api>>
 
 [float]
 [[no-master-block]]

+ 15 - 0
docs/resiliency/index.asciidoc

@@ -55,6 +55,21 @@ If you encounter an issue, https://github.com/elasticsearch/elasticsearch/issues
 
 We are committed to tracking down and fixing all the issues that are posted.
 
+[float]
+=== Use two phase commit for Cluster State publishing (STATUS: ONGOING)
+
+A master node in Elasticsearch continuously https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#fault-detection[monitors the cluster nodes]
+and removes any node from the cluster that doesn't respond to it's pings in a timely
+fashion. If the master is left with less nodes than the `discovery.zen.minimum_master_nodes`
+settings, it will step down and a new master election will start.
+
+When a network partition occurs causing a master to loose many followers, there is a
+short window of time until detects it and master steps down. During that window, the
+master may erroneously accept and ack cluster state changes. To avoid this, we introduce
+a new phase to cluster state publishing where the proposed cluster state is sent to all nodes
+but is not yet committed. Only once enough nodes (`minimum_master_nodes`) actively acknowledge
+the change, it is committed and commit messages are sent to the nodes. See See {GIT}13062[#13062].
+
 [float]
 === Make index creation more user friendly (STATUS: ONGOING)