Browse Source

Fully initialize cluster state on ephemeral nodes (#71466)

Today ephemeral nodes (i.e. those that aren't master-eligible and don't
contain data) have an initial "persisted" state which is very empty. In
particular it doesn't contain any cluster blocks or even the local node.
This violates some assumptions elsewhere that the local node is always
included in the cluster state, and breaks things like the
`ClusterFormationFailureHelper`:

    [DEBUG][o.e.c.c.ClusterFormationFailureHelper] unexpected exception scheduling cluster formation warning
      java.lang.NullPointerException: Cannot invoke "org.elasticsearch.cluster.node.DiscoveryNode.isMasterNode()" because the return value of "org.elasticsearch.cluster.node.DiscoveryNodes.getLocalNode()" is null
        at org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper$ClusterFormationState.getDescription(ClusterFormationFailureHelper.java:147) ~[elasticsearch-7.11.0.jar:7.11.0]
        at org.elasticsearch.cluster.coordination.ClusterFormationFailureHelper$WarningScheduler$1.doRun(ClusterFormationFailureHelper.java:92) [elasticsearch-7.11.0.jar:7.11.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732) [elasticsearch-7.11.0.jar:7.11.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) [elasticsearch-7.11.0.jar:7.11.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]

This commit addresses this by initializing the persisted state properly.
David Turner 4 years ago
parent
commit
ee2818b796

+ 1 - 0
server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

@@ -1006,6 +1006,7 @@ public class Coordinator extends AbstractLifecycleComponent implements Discovery
             // expose last accepted cluster state as base state upon which the master service
             // speculatively calculates the next cluster state update
             final ClusterState clusterState = coordinationState.get().getLastAcceptedState();
+            assert clusterState.nodes().getLocalNode() != null;
             if (mode != Mode.LEADER || clusterState.term() != getCurrentTerm()) {
                 // the master service checks if the local node is the master node in order to fail execution of the state update early
                 return clusterStateWithNoMasterBlock(clusterState);

+ 2 - 1
server/src/main/java/org/elasticsearch/gateway/GatewayMetaState.java

@@ -146,7 +146,8 @@ public class GatewayMetaState implements Closeable {
             }
         } else {
             final long currentTerm = 0L;
-            final ClusterState clusterState = ClusterState.builder(ClusterName.CLUSTER_NAME_SETTING.get(settings)).build();
+            final ClusterState clusterState = prepareInitialClusterState(transportService, clusterService,
+                    ClusterState.builder(ClusterName.CLUSTER_NAME_SETTING.get(settings)).build());
             if (persistedClusterStateService.getDataPaths().length > 0) {
                 // write empty cluster state just so that we have a persistent node id. There is no need to write out global metadata with
                 // cluster uuid as coordinating-only nodes do not snap into a cluster as they carry no state