|
@@ -1,38 +1,40 @@
|
|
|
[[cluster-state-publishing]]
|
|
|
=== Publishing the cluster state
|
|
|
|
|
|
-The master node is the only node in a cluster that can make changes to the
|
|
|
-cluster state. The master node processes one batch of cluster state updates at
|
|
|
-a time, computing the required changes and publishing the updated cluster state
|
|
|
-to all the other nodes in the cluster. Each publication starts with the master
|
|
|
-broadcasting the updated cluster state to all nodes in the cluster. Each node
|
|
|
-responds with an acknowledgement but does not yet apply the newly-received
|
|
|
-state. Once the master has collected acknowledgements from enough
|
|
|
-master-eligible nodes, the new cluster state is said to be _committed_ and the
|
|
|
-master broadcasts another message instructing nodes to apply the now-committed
|
|
|
-state. Each node receives this message, applies the updated state, and then
|
|
|
-sends a second acknowledgement back to the master.
|
|
|
+The elected master node is the only node in a cluster that can make changes to
|
|
|
+the cluster state. The elected master node processes one batch of cluster state
|
|
|
+updates at a time, computing the required changes and publishing the updated
|
|
|
+cluster state to all the other nodes in the cluster. Each publication starts
|
|
|
+with the elected master broadcasting the updated cluster state to all nodes in
|
|
|
+the cluster. Each node responds with an acknowledgement but does not yet apply
|
|
|
+the newly-received state. Once the elected master has collected
|
|
|
+acknowledgements from enough master-eligible nodes, the new cluster state is
|
|
|
+said to be _committed_ and the master broadcasts another message instructing
|
|
|
+nodes to apply the now-committed state. Each node receives this message,
|
|
|
+applies the updated state, and then sends a second acknowledgement back to the
|
|
|
+master.
|
|
|
|
|
|
-The master allows a limited amount of time for each cluster state update to be
|
|
|
-completely published to all nodes. It is defined by the
|
|
|
+The elected master allows a limited amount of time for each cluster state
|
|
|
+update to be completely published to all nodes. It is defined by the
|
|
|
`cluster.publish.timeout` setting, which defaults to `30s`, measured from the
|
|
|
time the publication started. If this time is reached before the new cluster
|
|
|
-state is committed then the cluster state change is rejected and the master
|
|
|
-considers itself to have failed. It stands down and starts trying to elect a
|
|
|
-new master.
|
|
|
+state is committed then the cluster state change is rejected and the elected
|
|
|
+master considers itself to have failed. It stands down and starts trying to
|
|
|
+elect a new master node.
|
|
|
|
|
|
If the new cluster state is committed before `cluster.publish.timeout` has
|
|
|
-elapsed, the master node considers the change to have succeeded. It waits until
|
|
|
-the timeout has elapsed or until it has received acknowledgements that each
|
|
|
-node in the cluster has applied the updated state, and then starts processing
|
|
|
-and publishing the next cluster state update. If some acknowledgements have not
|
|
|
-been received (i.e. some nodes have not yet confirmed that they have applied
|
|
|
-the current update), these nodes are said to be _lagging_ since their cluster
|
|
|
-states have fallen behind the master's latest state. The master waits for the
|
|
|
-lagging nodes to catch up for a further time, `cluster.follower_lag.timeout`,
|
|
|
-which defaults to `90s`. If a node has still not successfully applied the
|
|
|
-cluster state update within this time then it is considered to have failed and
|
|
|
-is removed from the cluster.
|
|
|
+elapsed, the elected master node considers the change to have succeeded. It
|
|
|
+waits until the timeout has elapsed or until it has received acknowledgements
|
|
|
+that each node in the cluster has applied the updated state, and then starts
|
|
|
+processing and publishing the next cluster state update. If some
|
|
|
+acknowledgements have not been received (i.e. some nodes have not yet confirmed
|
|
|
+that they have applied the current update), these nodes are said to be
|
|
|
+_lagging_ since their cluster states have fallen behind the elected master's
|
|
|
+latest state. The elected master waits for the lagging nodes to catch up for a
|
|
|
+further time, `cluster.follower_lag.timeout`, which defaults to `90s`. If a
|
|
|
+node has still not successfully applied the cluster state update within this
|
|
|
+time then it is considered to have failed and the elected master removes it
|
|
|
+from the cluster.
|
|
|
|
|
|
Cluster state updates are typically published as diffs to the previous cluster
|
|
|
state, which reduces the time and network bandwidth needed to publish a cluster
|
|
@@ -40,12 +42,19 @@ state update. For example, when updating the mappings for only a subset of the
|
|
|
indices in the cluster state, only the updates for those indices need to be
|
|
|
published to the nodes in the cluster, as long as those nodes have the previous
|
|
|
cluster state. If a node is missing the previous cluster state, for example
|
|
|
-when rejoining a cluster, the master will publish the full cluster state to
|
|
|
-that node so that it can receive future updates as diffs.
|
|
|
+when rejoining a cluster, the elected master will publish the full cluster
|
|
|
+state to that node so that it can receive future updates as diffs.
|
|
|
|
|
|
NOTE: {es} is a peer to peer based system, in which nodes communicate with one
|
|
|
another directly. The high-throughput APIs (index, delete, search) do not
|
|
|
-normally interact with the master node. The responsibility of the master node
|
|
|
-is to maintain the global cluster state and reassign shards when nodes join or
|
|
|
-leave the cluster. Each time the cluster state is changed, the new state is
|
|
|
-published to all nodes in the cluster as described above.
|
|
|
+normally interact with the elected master node. The responsibility of the
|
|
|
+elected master node is to maintain the global cluster state which includes
|
|
|
+reassigning shards when nodes join or leave the cluster. Each time the cluster
|
|
|
+state is changed, the new state is published to all nodes in the cluster as
|
|
|
+described above.
|
|
|
+
|
|
|
+The performance characteristics of cluster state updates are a function of the
|
|
|
+speed of the storage on each master-eligible node, as well as the reliability
|
|
|
+and latency of the network interconnections between all nodes in the cluster.
|
|
|
+You must therefore ensure that the storage and networking available to the
|
|
|
+nodes in your cluster are good enough to meet your performance goals.
|