Browse Source

Improve node-{join,left} logging for troubleshooting (#92742)

Today to troubleshoot an unstable cluster we ask the users to parse the
rather complex `node-join` and `node-left` messages emitted by the
`MasterService`. These messages may refer to many nodes, may be
truncated, and are generally pretty hard to work with.

With this commit we start to emit a simplified log message about each
node added and removed. It also renames the respective executor classes:

- `JoinTaskExecutor` -> `NodeJoinExecutor`
- `NodeRemovalClusterStateTaskExecutor` -> `NodeLeftExecutor`

This brings their names in line with each other, and the messages that
they emit, whilst preserving the older `node-join` and `node-left`
terminology as reported by the `MasterService`.

Finally, it updates the troubleshooting logs to reflect these new and
simplified logs.

Relates #92741
David Turner 2 years ago
parent
commit
5182748318

+ 5 - 0
docs/changelog/92742.yaml

@@ -0,0 +1,5 @@
+pr: 92742
+summary: "Improve node-{join,left} logging for troubleshooting"
+area: Cluster Coordination
+type: enhancement
+issues: []

+ 41 - 26
docs/reference/modules/discovery/fault-detection.asciidoc

@@ -52,7 +52,7 @@ logs.
 * The master may appear busy due to frequent cluster state updates.
 
 To troubleshoot a cluster in this state, first ensure the cluster has a
-<<modules-discovery-troubleshooting,stable master>>. Next, focus on the nodes
+<<discovery-troubleshooting,stable master>>. Next, focus on the nodes
 unexpectedly leaving the cluster ahead of all other issues. It will not be
 possible to solve other issues until the cluster has a stable master node and
 stable node membership.
@@ -62,23 +62,33 @@ tools only offer a view of the state of the cluster at a single point in time.
 Instead, look at the cluster logs to see the pattern of behaviour over time.
 Focus particularly on logs from the elected master. When a node leaves the
 cluster, logs for the elected master include a message like this (with line
-breaks added for clarity):
+breaks added to make it easier to read):
 
 [source,text]
 ----
-[2022-03-21T11:02:35,513][INFO ][o.e.c.s.MasterService    ]
-    [instance-0000000000] node-left[
-        {instance-0000000004}{bfcMDTiDRkietFb9v_di7w}{aNlyORLASam1ammv2DzYXA}{172.27.47.21}{172.27.47.21:19054}{m}
-            reason: disconnected,
-        {tiebreaker-0000000003}{UNw_RuazQCSBskWZV8ID_w}{bltyVOQ-RNu20OQfTHSLtA}{172.27.161.154}{172.27.161.154:19251}{mv}
-            reason: disconnected
-        ], term: 14, version: 1653415, ...
+[2022-03-21T11:02:35,513][INFO ][o.e.c.c.NodeLeftExecutor] [instance-0000000000] node-left:
+    removed [{instance-0000000004}{bfcMDTiDRkietFb9v_di7w}{aNlyORLASam1ammv2DzYXA}{172.27.47.21}{172.27.47.21:19054}{m}]
+    with reason [test reason]
+----
+
+This message says that the `NodeLeftExecutor` on the elected master
+(`instance-0000000000`) processed a `node-left` task, identifying the node that
+was removed and the reason for its removal. When the node joins the cluster
+again, logs for the elected master will include a message like this (with line
+breaks added to make it easier to read):
+
+[source,text]
+----
+[2022-03-21T11:02:59,892][INFO ][o.e.c.c.NodeJoinExecutor] [instance-0000000000] node-join:
+    added [{instance-0000000004}{bfcMDTiDRkietFb9v_di7w}{UNw_RuazQCSBskWZV8ID_w}{172.27.47.21}{172.27.47.21:19054}{m}]
+    with reason [joining after restart, removed [24s] ago with reason [disconnected]]
 ----
 
-This message says that the `MasterService` on the elected master
-(`instance-0000000000`) is processing a `node-left` task. It lists the nodes
-that are being removed and the reasons for their removal. Other nodes may log
-similar messages, but report fewer details:
+This message says that the `NodeJoinExecutor` on the elected master
+(`instance-0000000000`) processed a `node-join` task, identifying the node that
+was added to the cluster and the reason for the task.
+
+Other nodes may log similar messages, but report fewer details:
 
 [source,text]
 ----
@@ -89,9 +99,10 @@ similar messages, but report fewer details:
     }, term: 14, version: 1653415, reason: Publication{term=14, version=1653415}
 ----
 
-Focus on the one from the `MasterService` which is only emitted on the elected
-master, since it contains more details. If you don't see the messages from the
-`MasterService`, check that:
+These messages are not especially useful for troubleshooting, so focus on the
+ones from the `NodeLeftExecutor` and `NodeJoinExecutor` which are only emitted
+on the elected master and which contain more details. If you don't see the
+messages from the `NodeLeftExecutor` and `NodeJoinExecutor`, check that:
 
 * You're looking at the logs for the elected master node.
 
@@ -104,18 +115,14 @@ start or stop following the elected master. You can use these messages to
 determine each node's view of the state of the master over time.
 
 If a node restarts, it will leave the cluster and then join the cluster again.
-When it rejoins, the `MasterService` will log that it is processing a
-`node-join` task. You can tell from the master logs that the node was restarted
-because the `node-join` message will indicate that it is
-`joining after restart`. In older {es} versions, you can also determine that a
-node restarted by looking at the second "ephemeral" ID in the `node-left` and
-subsequent `node-join` messages. This ephemeral ID is different each time the
-node starts up. If a node is unexpectedly restarting, you'll need to look at
-the node's logs to see why it is shutting down.
+When it rejoins, the `NodeJoinExecutor` will log that it processed a
+`node-join` task indicating that the node is `joining after restart`. If a node
+is unexpectedly restarting, look at the node's logs to see why it is shutting
+down.
 
 If the node did not restart then you should look at the reason for its
-departure in the `node-left` message, which is reported after each node. There
-are three possible reasons:
+departure more closely. Each reason has different troubleshooting steps,
+described below. There are three possible reasons:
 
 * `disconnected`: The connection from the master node to the removed node was
 closed.
@@ -134,6 +141,10 @@ control this mechanism.
 
 ===== Diagnosing `disconnected` nodes
 
+Nodes typically leave the cluster with reason `disconnected` when they shut
+down, but if they rejoin the cluster without restarting then there is some
+other problem.
+
 {es} is designed to run on a fairly reliable network. It opens a number of TCP
 connections between nodes and expects these connections to remain open forever.
 If a connection is closed then {es} will try and reconnect, so the occasional
@@ -194,6 +205,10 @@ the logs on the elected master.
 
 ===== Diagnosing `follower check retry count exceeded` nodes
 
+Nodes sometimes leave the cluster with reason `follower check retry count
+exceeded` when they shut down, but if they rejoin the cluster without
+restarting then there is some other problem.
+
 {es} needs every node to respond to network messages successfully and
 reasonably quickly. If a node rejects requests or does not respond at all then
 it can be harmful to the cluster. If enough consecutive checks fail then the

+ 6 - 11
server/src/main/java/org/elasticsearch/cluster/coordination/Coordinator.java

@@ -137,7 +137,7 @@ public class Coordinator extends AbstractLifecycleComponent implements ClusterSt
     private final AllocationService allocationService;
     private final JoinHelper joinHelper;
     private final JoinValidationService joinValidationService;
-    private final NodeRemovalClusterStateTaskExecutor nodeRemovalExecutor;
+    private final NodeLeftExecutor nodeLeftExecutor;
     private final Supplier<CoordinationState.PersistedState> persistedStateSupplier;
     private final NoMasterBlockService noMasterBlockService;
     final Object mutex = new Object(); // package-private to allow tests to call methods that assert that the mutex is held
@@ -205,7 +205,7 @@ public class Coordinator extends AbstractLifecycleComponent implements ClusterSt
         this.transportService = transportService;
         this.masterService = masterService;
         this.allocationService = allocationService;
-        this.onJoinValidators = JoinTaskExecutor.addBuiltInJoinValidators(onJoinValidators);
+        this.onJoinValidators = NodeJoinExecutor.addBuiltInJoinValidators(onJoinValidators);
         this.singleNodeDiscovery = DiscoveryModule.isSingleNodeDiscovery(settings);
         this.electionStrategy = electionStrategy;
         this.joinReasonService = new JoinReasonService(transportService.getThreadPool()::relativeTimeInMillis);
@@ -272,7 +272,7 @@ public class Coordinator extends AbstractLifecycleComponent implements ClusterSt
             this::removeNode,
             nodeHealthService
         );
-        this.nodeRemovalExecutor = new NodeRemovalClusterStateTaskExecutor(allocationService);
+        this.nodeLeftExecutor = new NodeLeftExecutor(allocationService);
         this.clusterApplier = clusterApplier;
         masterService.setClusterStateSupplier(this::getStateForMasterService);
         this.reconfigurator = new Reconfigurator(settings, clusterSettings);
@@ -339,16 +339,11 @@ public class Coordinator extends AbstractLifecycleComponent implements ClusterSt
     private void removeNode(DiscoveryNode discoveryNode, String reason) {
         synchronized (mutex) {
             if (mode == Mode.LEADER) {
-                var task = new NodeRemovalClusterStateTaskExecutor.Task(
-                    discoveryNode,
-                    reason,
-                    () -> joinReasonService.onNodeRemoved(discoveryNode, reason)
-                );
                 masterService.submitStateUpdateTask(
                     "node-left",
-                    task,
+                    new NodeLeftExecutor.Task(discoveryNode, reason, () -> joinReasonService.onNodeRemoved(discoveryNode, reason)),
                     ClusterStateTaskConfig.build(Priority.IMMEDIATE),
-                    nodeRemovalExecutor
+                    nodeLeftExecutor
                 );
             }
         }
@@ -664,7 +659,7 @@ public class Coordinator extends AbstractLifecycleComponent implements ClusterSt
             if (stateForJoinValidation.getBlocks().hasGlobalBlock(STATE_NOT_RECOVERED_BLOCK) == false) {
                 // We do this in a couple of places including the cluster update thread. This one here is really just best effort to ensure
                 // we fail as fast as possible.
-                JoinTaskExecutor.ensureVersionBarrier(
+                NodeJoinExecutor.ensureVersionBarrier(
                     joinRequest.getSourceNode().getVersion(),
                     stateForJoinValidation.getNodes().getMinNodeVersion()
                 );

+ 13 - 9
server/src/main/java/org/elasticsearch/cluster/coordination/JoinHelper.java

@@ -67,7 +67,7 @@ public class JoinHelper {
     private final MasterService masterService;
     private final ClusterApplier clusterApplier;
     private final TransportService transportService;
-    private final JoinTaskExecutor joinTaskExecutor;
+    private final NodeJoinExecutor nodeJoinExecutor;
     private final LongSupplier currentTermSupplier;
     private final NodeHealthService nodeHealthService;
     private final JoinReasonService joinReasonService;
@@ -94,7 +94,7 @@ public class JoinHelper {
         this.clusterApplier = clusterApplier;
         this.transportService = transportService;
         this.circuitBreakerService = circuitBreakerService;
-        this.joinTaskExecutor = new JoinTaskExecutor(allocationService, rerouteService);
+        this.nodeJoinExecutor = new NodeJoinExecutor(allocationService, rerouteService);
         this.currentTermSupplier = currentTermSupplier;
         this.nodeHealthService = nodeHealthService;
         this.joinReasonService = joinReasonService;
@@ -389,13 +389,17 @@ public class JoinHelper {
     class LeaderJoinAccumulator implements JoinAccumulator {
         @Override
         public void handleJoinRequest(DiscoveryNode sender, ActionListener<Void> joinListener) {
-            final JoinTask task = JoinTask.singleNode(
-                sender,
-                joinReasonService.getJoinReason(sender, Mode.LEADER),
-                joinListener,
-                currentTermSupplier.getAsLong()
+            masterService.submitStateUpdateTask(
+                "node-join",
+                JoinTask.singleNode(
+                    sender,
+                    joinReasonService.getJoinReason(sender, Mode.LEADER),
+                    joinListener,
+                    currentTermSupplier.getAsLong()
+                ),
+                ClusterStateTaskConfig.build(Priority.URGENT),
+                nodeJoinExecutor
             );
-            masterService.submitStateUpdateTask("node-join", task, ClusterStateTaskConfig.build(Priority.URGENT), joinTaskExecutor);
         }
 
         @Override
@@ -461,7 +465,7 @@ public class JoinHelper {
                     "elected-as-master ([" + joinTask.nodeCount() + "] nodes joined)",
                     joinTask,
                     ClusterStateTaskConfig.build(Priority.URGENT),
-                    joinTaskExecutor
+                    nodeJoinExecutor
 
                 );
             } else {

+ 11 - 4
server/src/main/java/org/elasticsearch/cluster/coordination/JoinTaskExecutor.java → server/src/main/java/org/elasticsearch/cluster/coordination/NodeJoinExecutor.java

@@ -36,14 +36,14 @@ import java.util.stream.Collectors;
 
 import static org.elasticsearch.gateway.GatewayService.STATE_NOT_RECOVERED_BLOCK;
 
-public class JoinTaskExecutor implements ClusterStateTaskExecutor<JoinTask> {
+public class NodeJoinExecutor implements ClusterStateTaskExecutor<JoinTask> {
 
-    private static final Logger logger = LogManager.getLogger(JoinTaskExecutor.class);
+    private static final Logger logger = LogManager.getLogger(NodeJoinExecutor.class);
 
     private final AllocationService allocationService;
     private final RerouteService rerouteService;
 
-    public JoinTaskExecutor(AllocationService allocationService, RerouteService rerouteService) {
+    public NodeJoinExecutor(AllocationService allocationService, RerouteService rerouteService) {
         this.allocationService = allocationService;
         this.rerouteService = rerouteService;
     }
@@ -135,7 +135,14 @@ public class JoinTaskExecutor implements ClusterStateTaskExecutor<JoinTask> {
                         continue;
                     }
                 }
-                onTaskSuccess.add(() -> nodeJoinTask.listener().onResponse(null));
+                onTaskSuccess.add(() -> {
+                    logger.info(
+                        "node-join: [{}] with reason [{}]",
+                        nodeJoinTask.node().descriptionWithoutAttributes(),
+                        nodeJoinTask.reason()
+                    );
+                    nodeJoinTask.listener().onResponse(null);
+                });
             }
             joinTaskContext.success(() -> {
                 for (Runnable joinCompleter : onTaskSuccess) {

+ 12 - 4
server/src/main/java/org/elasticsearch/cluster/coordination/NodeRemovalClusterStateTaskExecutor.java → server/src/main/java/org/elasticsearch/cluster/coordination/NodeLeftExecutor.java

@@ -19,9 +19,9 @@ import org.elasticsearch.cluster.routing.allocation.AllocationService;
 import org.elasticsearch.cluster.service.MasterService;
 import org.elasticsearch.persistent.PersistentTasksCustomMetadata;
 
-public class NodeRemovalClusterStateTaskExecutor implements ClusterStateTaskExecutor<NodeRemovalClusterStateTaskExecutor.Task> {
+public class NodeLeftExecutor implements ClusterStateTaskExecutor<NodeLeftExecutor.Task> {
 
-    private static final Logger logger = LogManager.getLogger(NodeRemovalClusterStateTaskExecutor.class);
+    private static final Logger logger = LogManager.getLogger(NodeLeftExecutor.class);
 
     private final AllocationService allocationService;
 
@@ -41,7 +41,7 @@ public class NodeRemovalClusterStateTaskExecutor implements ClusterStateTaskExec
         }
     }
 
-    public NodeRemovalClusterStateTaskExecutor(AllocationService allocationService) {
+    public NodeLeftExecutor(AllocationService allocationService) {
         this.allocationService = allocationService;
     }
 
@@ -52,13 +52,21 @@ public class NodeRemovalClusterStateTaskExecutor implements ClusterStateTaskExec
         boolean removed = false;
         for (final var taskContext : batchExecutionContext.taskContexts()) {
             final var task = taskContext.getTask();
+            final String reason;
             if (initialState.nodes().nodeExists(task.node())) {
                 remainingNodesBuilder.remove(task.node());
                 removed = true;
+                reason = task.reason();
             } else {
                 logger.debug("node [{}] does not exist in cluster state, ignoring", task);
+                reason = null;
             }
-            taskContext.success(task.onClusterStateProcessed::run);
+            taskContext.success(() -> {
+                if (reason != null) {
+                    logger.info("node-left: [{}] with reason [{}]", task.node().descriptionWithoutAttributes(), reason);
+                }
+                task.onClusterStateProcessed.run();
+            });
         }
 
         if (removed == false) {

+ 2 - 2
server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNodeWithStatus.java

@@ -47,7 +47,7 @@ public record DesiredNodeWithStatus(DesiredNode desiredNode, Status status)
             ),
             // An unknown status is expected during upgrades to versions >= STATUS_TRACKING_SUPPORT_VERSION
             // the desired node status would be populated when a node in the newer version is elected as
-            // master, the desired nodes status update happens in JoinTaskExecutor.
+            // master, the desired nodes status update happens in NodeJoinExecutor.
             args[6] == null ? Status.PENDING : (Status) args[6]
         )
     );
@@ -84,7 +84,7 @@ public record DesiredNodeWithStatus(DesiredNode desiredNode, Status status)
             // since it's impossible to know if a node that was supposed to
             // join the cluster, it joined. The status will be updated
             // once the master node is upgraded to a version >= STATUS_TRACKING_SUPPORT_VERSION
-            // in JoinTaskExecutor or when the desired nodes are upgraded to a new version.
+            // in NodeJoinExecutor or when the desired nodes are upgraded to a new version.
             status = Status.PENDING;
         }
         return new DesiredNodeWithStatus(desiredNode, status);

+ 3 - 2
server/src/main/java/org/elasticsearch/cluster/metadata/DesiredNodes.java

@@ -8,7 +8,9 @@
 
 package org.elasticsearch.cluster.metadata;
 
+import org.elasticsearch.action.admin.cluster.desirednodes.TransportUpdateDesiredNodesAction;
 import org.elasticsearch.cluster.ClusterState;
+import org.elasticsearch.cluster.coordination.NodeJoinExecutor;
 import org.elasticsearch.cluster.node.DiscoveryNode;
 import org.elasticsearch.cluster.node.DiscoveryNodes;
 import org.elasticsearch.common.io.stream.StreamInput;
@@ -97,8 +99,7 @@ import static org.elasticsearch.node.Node.NODE_EXTERNAL_ID_SETTING;
  *  </ul>
  *
  * <p>
- *  See {@code JoinTaskExecutor} and {@code TransportUpdateDesiredNodesAction} for more details about
- *  desired nodes status tracking.
+ *  See {@link NodeJoinExecutor} and {@link TransportUpdateDesiredNodesAction} for more details about desired nodes status tracking.
  * </p>
  *
  * <p>

+ 111 - 53
server/src/test/java/org/elasticsearch/cluster/coordination/JoinTaskExecutorTests.java → server/src/test/java/org/elasticsearch/cluster/coordination/NodeJoinExecutorTests.java

@@ -7,10 +7,13 @@
  */
 package org.elasticsearch.cluster.coordination;
 
+import org.apache.logging.log4j.Level;
 import org.elasticsearch.Version;
 import org.elasticsearch.action.ActionListener;
+import org.elasticsearch.action.support.PlainActionFuture;
 import org.elasticsearch.cluster.ClusterName;
 import org.elasticsearch.cluster.ClusterState;
+import org.elasticsearch.cluster.ClusterStateTaskConfig;
 import org.elasticsearch.cluster.NotMasterException;
 import org.elasticsearch.cluster.block.ClusterBlocks;
 import org.elasticsearch.cluster.metadata.DesiredNodeWithStatus;
@@ -24,16 +27,22 @@ import org.elasticsearch.cluster.node.DiscoveryNodes;
 import org.elasticsearch.cluster.routing.RerouteService;
 import org.elasticsearch.cluster.routing.allocation.AllocationService;
 import org.elasticsearch.cluster.service.ClusterStateTaskExecutorUtils;
+import org.elasticsearch.common.Priority;
 import org.elasticsearch.common.UUIDs;
 import org.elasticsearch.common.settings.Settings;
+import org.elasticsearch.test.ClusterServiceUtils;
 import org.elasticsearch.test.ESTestCase;
+import org.elasticsearch.test.MockLogAppender;
 import org.elasticsearch.test.VersionUtils;
+import org.elasticsearch.threadpool.TestThreadPool;
+import org.elasticsearch.threadpool.ThreadPool;
 
 import java.util.Collections;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
+import java.util.concurrent.TimeUnit;
 import java.util.stream.Stream;
 
 import static org.elasticsearch.cluster.metadata.DesiredNodesTestCase.assertDesiredNodesStatusIsCorrect;
@@ -54,7 +63,7 @@ import static org.mockito.ArgumentMatchers.anyBoolean;
 import static org.mockito.Mockito.mock;
 import static org.mockito.Mockito.when;
 
-public class JoinTaskExecutorTests extends ESTestCase {
+public class NodeJoinExecutorTests extends ESTestCase {
 
     private static final ActionListener<Void> NOT_COMPLETED_LISTENER = ActionListener.wrap(
         () -> { throw new AssertionError("should not complete publication"); }
@@ -70,11 +79,11 @@ public class JoinTaskExecutorTests extends ESTestCase {
             .build();
         metaBuilder.put(indexMetadata, false);
         Metadata metadata = metaBuilder.build();
-        JoinTaskExecutor.ensureIndexCompatibility(Version.CURRENT, metadata);
+        NodeJoinExecutor.ensureIndexCompatibility(Version.CURRENT, metadata);
 
         expectThrows(
             IllegalStateException.class,
-            () -> JoinTaskExecutor.ensureIndexCompatibility(VersionUtils.getPreviousVersion(Version.CURRENT), metadata)
+            () -> NodeJoinExecutor.ensureIndexCompatibility(VersionUtils.getPreviousVersion(Version.CURRENT), metadata)
         );
     }
 
@@ -88,7 +97,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
             .build();
         metaBuilder.put(indexMetadata, false);
         Metadata metadata = metaBuilder.build();
-        expectThrows(IllegalStateException.class, () -> JoinTaskExecutor.ensureIndexCompatibility(Version.CURRENT, metadata));
+        expectThrows(IllegalStateException.class, () -> NodeJoinExecutor.ensureIndexCompatibility(Version.CURRENT, metadata));
     }
 
     public void testPreventJoinClusterWithUnsupportedNodeVersions() {
@@ -104,9 +113,9 @@ public class JoinTaskExecutorTests extends ESTestCase {
         final Version tooLow = Version.fromId(maxNodeVersion.minimumCompatibilityVersion().id - 100);
         expectThrows(IllegalStateException.class, () -> {
             if (randomBoolean()) {
-                JoinTaskExecutor.ensureNodesCompatibility(tooLow, nodes);
+                NodeJoinExecutor.ensureNodesCompatibility(tooLow, nodes);
             } else {
-                JoinTaskExecutor.ensureNodesCompatibility(tooLow, minNodeVersion, maxNodeVersion);
+                NodeJoinExecutor.ensureNodesCompatibility(tooLow, minNodeVersion, maxNodeVersion);
             }
         });
 
@@ -114,7 +123,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
             v -> v.onOrAfter(minNodeVersion),
             () -> rarely() ? Version.fromId(minNodeVersion.id - 1) : randomVersion(random())
         );
-        expectThrows(IllegalStateException.class, () -> JoinTaskExecutor.ensureVersionBarrier(oldVersion, minNodeVersion));
+        expectThrows(IllegalStateException.class, () -> NodeJoinExecutor.ensureVersionBarrier(oldVersion, minNodeVersion));
 
         final Version minGoodVersion = maxNodeVersion.major == minNodeVersion.major ?
         // we have to stick with the same major
@@ -122,9 +131,9 @@ public class JoinTaskExecutorTests extends ESTestCase {
         final Version justGood = randomVersionBetween(random(), minGoodVersion, maxCompatibleVersion(minNodeVersion));
 
         if (randomBoolean()) {
-            JoinTaskExecutor.ensureNodesCompatibility(justGood, nodes);
+            NodeJoinExecutor.ensureNodesCompatibility(justGood, nodes);
         } else {
-            JoinTaskExecutor.ensureNodesCompatibility(justGood, minNodeVersion, maxNodeVersion);
+            NodeJoinExecutor.ensureNodesCompatibility(justGood, minNodeVersion, maxNodeVersion);
         }
     }
 
@@ -144,7 +153,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
             .build();
         metaBuilder.put(indexMetadata, false);
         Metadata metadata = metaBuilder.build();
-        JoinTaskExecutor.ensureIndexCompatibility(Version.CURRENT, metadata);
+        NodeJoinExecutor.ensureIndexCompatibility(Version.CURRENT, metadata);
     }
 
     public static Settings.Builder randomCompatibleVersionSettings() {
@@ -165,6 +174,8 @@ public class JoinTaskExecutorTests extends ESTestCase {
         return VersionUtils.randomVersionBetween(random(), Version.CURRENT.minimumIndexCompatibilityVersion(), Version.CURRENT);
     }
 
+    private static final String TEST_REASON = "test";
+
     public void testUpdatesNodeWithNewRoles() throws Exception {
         // Node roles vary by version, and new roles are suppressed for BWC. This means we can receive a join from a node that's already
         // in the cluster but with a different set of roles: the node didn't change roles, but the cluster state came via an older master.
@@ -174,7 +185,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
         when(allocationService.adaptAutoExpandReplicas(any())).then(invocationOnMock -> invocationOnMock.getArguments()[0]);
         final RerouteService rerouteService = (reason, priority, listener) -> listener.onResponse(null);
 
-        final JoinTaskExecutor joinTaskExecutor = new JoinTaskExecutor(allocationService, rerouteService);
+        final NodeJoinExecutor executor = new NodeJoinExecutor(allocationService, rerouteService);
 
         final DiscoveryNode masterNode = new DiscoveryNode(UUIDs.base64UUID(), buildNewFakeTransportAddress(), Version.CURRENT);
 
@@ -196,8 +207,8 @@ public class JoinTaskExecutorTests extends ESTestCase {
 
         final var resultingState = ClusterStateTaskExecutorUtils.executeAndAssertSuccessful(
             clusterState,
-            joinTaskExecutor,
-            List.of(JoinTask.singleNode(actualNode, "test", NOT_COMPLETED_LISTENER, 0L))
+            executor,
+            List.of(JoinTask.singleNode(actualNode, TEST_REASON, NOT_COMPLETED_LISTENER, 0L))
         );
 
         assertThat(resultingState.getNodes().get(actualNode.getId()).getRoles(), equalTo(actualNode.getRoles()));
@@ -208,7 +219,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
         final RerouteService rerouteService = (reason, priority, listener) -> listener.onResponse(null);
 
         final long executorTerm = randomLongBetween(0L, Long.MAX_VALUE - 1);
-        final var joinTaskExecutor = new JoinTaskExecutor(allocationService, rerouteService);
+        final var executor = new NodeJoinExecutor(allocationService, rerouteService);
 
         final var masterNode = new DiscoveryNode(UUIDs.randomBase64UUID(random()), buildNewFakeTransportAddress(), Version.CURRENT);
         final var clusterState = ClusterState.builder(ClusterName.DEFAULT)
@@ -225,12 +236,12 @@ public class JoinTaskExecutorTests extends ESTestCase {
                 NotMasterException.class,
                 () -> ClusterStateTaskExecutorUtils.executeHandlingResults(
                     clusterState,
-                    joinTaskExecutor,
+                    executor,
                     randomBoolean()
-                        ? List.of(JoinTask.singleNode(masterNode, "test", NOT_COMPLETED_LISTENER, executorTerm))
+                        ? List.of(JoinTask.singleNode(masterNode, TEST_REASON, NOT_COMPLETED_LISTENER, executorTerm))
                         : List.of(
                             JoinTask.completingElection(
-                                Stream.of(new JoinTask.NodeJoinTask(masterNode, "test", NOT_COMPLETED_LISTENER)),
+                                Stream.of(new JoinTask.NodeJoinTask(masterNode, TEST_REASON, NOT_COMPLETED_LISTENER)),
                                 executorTerm
                             )
                         ),
@@ -247,7 +258,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
         final RerouteService rerouteService = (reason, priority, listener) -> listener.onResponse(null);
 
         final long executorTerm = randomNonNegativeLong();
-        final var joinTaskExecutor = new JoinTaskExecutor(allocationService, rerouteService);
+        final var executor = new NodeJoinExecutor(allocationService, rerouteService);
 
         final var masterNode = new DiscoveryNode(UUIDs.randomBase64UUID(random()), buildNewFakeTransportAddress(), Version.CURRENT);
         final var localNode = new DiscoveryNode(UUIDs.randomBase64UUID(random()), buildNewFakeTransportAddress(), Version.CURRENT);
@@ -272,12 +283,12 @@ public class JoinTaskExecutorTests extends ESTestCase {
                 NotMasterException.class,
                 () -> ClusterStateTaskExecutorUtils.executeHandlingResults(
                     clusterState,
-                    joinTaskExecutor,
+                    executor,
                     randomBoolean()
-                        ? List.of(JoinTask.singleNode(masterNode, "test", NOT_COMPLETED_LISTENER, executorTerm))
+                        ? List.of(JoinTask.singleNode(masterNode, TEST_REASON, NOT_COMPLETED_LISTENER, executorTerm))
                         : List.of(
                             JoinTask.completingElection(
-                                Stream.of(new JoinTask.NodeJoinTask(masterNode, "test", NOT_COMPLETED_LISTENER)),
+                                Stream.of(new JoinTask.NodeJoinTask(masterNode, TEST_REASON, NOT_COMPLETED_LISTENER)),
                                 executorTerm
                             )
                         ),
@@ -294,7 +305,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
         final RerouteService rerouteService = (reason, priority, listener) -> listener.onResponse(null);
 
         final long executorTerm = randomNonNegativeLong();
-        final var joinTaskExecutor = new JoinTaskExecutor(allocationService, rerouteService);
+        final var executor = new NodeJoinExecutor(allocationService, rerouteService);
 
         final var masterNode = new DiscoveryNode(UUIDs.base64UUID(), buildNewFakeTransportAddress(), Version.CURRENT);
         final var clusterState = ClusterState.builder(ClusterName.DEFAULT)
@@ -311,8 +322,8 @@ public class JoinTaskExecutorTests extends ESTestCase {
                 NotMasterException.class,
                 () -> ClusterStateTaskExecutorUtils.executeHandlingResults(
                     clusterState,
-                    joinTaskExecutor,
-                    List.of(JoinTask.singleNode(masterNode, "test", NOT_COMPLETED_LISTENER, executorTerm)),
+                    executor,
+                    List.of(JoinTask.singleNode(masterNode, TEST_REASON, NOT_COMPLETED_LISTENER, executorTerm)),
                     t -> fail("should not succeed"),
                     (t, e) -> assertThat(e, instanceOf(NotMasterException.class))
                 )
@@ -326,7 +337,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
         final RerouteService rerouteService = (reason, priority, listener) -> listener.onResponse(null);
 
         final long executorTerm = randomLongBetween(1, Long.MAX_VALUE);
-        final var joinTaskExecutor = new JoinTaskExecutor(allocationService, rerouteService);
+        final var executor = new NodeJoinExecutor(allocationService, rerouteService);
 
         final var masterNode = new DiscoveryNode(UUIDs.randomBase64UUID(random()), buildNewFakeTransportAddress(), Version.CURRENT);
         final var otherNodeOld = new DiscoveryNode(UUIDs.randomBase64UUID(random()), buildNewFakeTransportAddress(), Version.CURRENT);
@@ -352,12 +363,12 @@ public class JoinTaskExecutorTests extends ESTestCase {
                         .build()
                 )
                 .build(),
-            joinTaskExecutor,
+            executor,
             List.of(
                 JoinTask.completingElection(
                     Stream.of(
-                        new JoinTask.NodeJoinTask(masterNode, "test", NOT_COMPLETED_LISTENER),
-                        new JoinTask.NodeJoinTask(otherNodeNew, "test", NOT_COMPLETED_LISTENER)
+                        new JoinTask.NodeJoinTask(masterNode, TEST_REASON, NOT_COMPLETED_LISTENER),
+                        new JoinTask.NodeJoinTask(otherNodeNew, TEST_REASON, NOT_COMPLETED_LISTENER)
                     ),
                     executorTerm
                 )
@@ -376,10 +387,10 @@ public class JoinTaskExecutorTests extends ESTestCase {
             "existing node should not be replaced if not completing an election",
             ClusterStateTaskExecutorUtils.executeAndAssertSuccessful(
                 afterElectionClusterState,
-                joinTaskExecutor,
+                executor,
                 List.of(
-                    JoinTask.singleNode(masterNode, "test", NOT_COMPLETED_LISTENER, executorTerm),
-                    JoinTask.singleNode(otherNodeOld, "test", NOT_COMPLETED_LISTENER, executorTerm)
+                    JoinTask.singleNode(masterNode, TEST_REASON, NOT_COMPLETED_LISTENER, executorTerm),
+                    JoinTask.singleNode(otherNodeOld, TEST_REASON, NOT_COMPLETED_LISTENER, executorTerm)
                 )
             ).nodes().get(otherNodeNew.getId()).getEphemeralId(),
             equalTo(otherNodeNew.getEphemeralId())
@@ -391,7 +402,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
         final RerouteService rerouteService = (reason, priority, listener) -> listener.onResponse(null);
 
         final long executorTerm = randomLongBetween(1, Long.MAX_VALUE);
-        final var joinTaskExecutor = new JoinTaskExecutor(allocationService, rerouteService);
+        final var executor = new NodeJoinExecutor(allocationService, rerouteService);
 
         final var masterNode = new DiscoveryNode(UUIDs.randomBase64UUID(random()), buildNewFakeTransportAddress(), Version.CURRENT);
         final var otherNode = new DiscoveryNode(
@@ -429,12 +440,12 @@ public class JoinTaskExecutorTests extends ESTestCase {
         if (randomBoolean()) {
             clusterState = ClusterStateTaskExecutorUtils.executeAndAssertSuccessful(
                 clusterState,
-                joinTaskExecutor,
+                executor,
                 List.of(
                     JoinTask.completingElection(
                         Stream.of(
-                            new JoinTask.NodeJoinTask(masterNode, "test", NOT_COMPLETED_LISTENER),
-                            new JoinTask.NodeJoinTask(otherNode, "test", NOT_COMPLETED_LISTENER)
+                            new JoinTask.NodeJoinTask(masterNode, TEST_REASON, NOT_COMPLETED_LISTENER),
+                            new JoinTask.NodeJoinTask(otherNode, TEST_REASON, NOT_COMPLETED_LISTENER)
                         ),
                         executorTerm
                     )
@@ -443,18 +454,18 @@ public class JoinTaskExecutorTests extends ESTestCase {
         } else {
             clusterState = ClusterStateTaskExecutorUtils.executeAndAssertSuccessful(
                 clusterState,
-                joinTaskExecutor,
+                executor,
                 List.of(
                     JoinTask.completingElection(
-                        Stream.of(new JoinTask.NodeJoinTask(masterNode, "test", NOT_COMPLETED_LISTENER)),
+                        Stream.of(new JoinTask.NodeJoinTask(masterNode, TEST_REASON, NOT_COMPLETED_LISTENER)),
                         executorTerm
                     )
                 )
             );
             clusterState = ClusterStateTaskExecutorUtils.executeAndAssertSuccessful(
                 clusterState,
-                joinTaskExecutor,
-                List.of(JoinTask.singleNode(otherNode, "test", NOT_COMPLETED_LISTENER, executorTerm))
+                executor,
+                List.of(JoinTask.singleNode(otherNode, TEST_REASON, NOT_COMPLETED_LISTENER, executorTerm))
             );
         }
 
@@ -471,7 +482,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
         final RerouteService rerouteService = (reason, priority, listener) -> listener.onResponse(null);
 
         final long currentTerm = randomLongBetween(100, 1000);
-        final var joinTaskExecutor = new JoinTaskExecutor(allocationService, rerouteService);
+        final var executor = new NodeJoinExecutor(allocationService, rerouteService);
 
         final var masterNode = new DiscoveryNode(UUIDs.randomBase64UUID(random()), buildNewFakeTransportAddress(), Version.CURRENT);
         final var clusterState = ClusterState.builder(ClusterName.DEFAULT)
@@ -480,14 +491,13 @@ public class JoinTaskExecutorTests extends ESTestCase {
             .build();
 
         var tasks = Stream.concat(
-            Stream.generate(() -> createRandomTask(masterNode, "outdated", randomLongBetween(0, currentTerm - 1)))
-                .limit(randomLongBetween(1, 10)),
-            Stream.of(createRandomTask(masterNode, "current", currentTerm))
+            Stream.generate(() -> createRandomTask(masterNode, randomLongBetween(0, currentTerm - 1))).limit(randomLongBetween(1, 10)),
+            Stream.of(createRandomTask(masterNode, currentTerm))
         ).toList();
 
         ClusterStateTaskExecutorUtils.executeHandlingResults(
             clusterState,
-            joinTaskExecutor,
+            executor,
             tasks,
             t -> assertThat(t.term(), equalTo(currentTerm)),
             (t, e) -> {
@@ -500,7 +510,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
     public void testDesiredNodesMembershipIsUpgradedWhenNewNodesJoin() throws Exception {
         final var allocationService = createAllocationService();
         final RerouteService rerouteService = (reason, priority, listener) -> listener.onResponse(null);
-        final var joinTaskExecutor = new JoinTaskExecutor(allocationService, rerouteService);
+        final var executor = new NodeJoinExecutor(allocationService, rerouteService);
 
         final var actualizedDesiredNodes = randomList(0, 5, this::createActualizedDesiredNode);
         final var pendingDesiredNodes = randomList(0, 5, this::createPendingDesiredNode);
@@ -519,9 +529,9 @@ public class JoinTaskExecutorTests extends ESTestCase {
         );
         final var desiredNodes = DesiredNodes.latestFromClusterState(clusterState);
 
-        var tasks = joiningNodes.stream().map(node -> JoinTask.singleNode(node, "join", NOT_COMPLETED_LISTENER, 0L)).toList();
+        var tasks = joiningNodes.stream().map(node -> JoinTask.singleNode(node, TEST_REASON, NOT_COMPLETED_LISTENER, 0L)).toList();
 
-        final var updatedClusterState = ClusterStateTaskExecutorUtils.executeAndAssertSuccessful(clusterState, joinTaskExecutor, tasks);
+        final var updatedClusterState = ClusterStateTaskExecutorUtils.executeAndAssertSuccessful(clusterState, executor, tasks);
 
         final var updatedDesiredNodes = DesiredNodes.latestFromClusterState(clusterState);
         assertThat(updatedDesiredNodes, is(notNullValue()));
@@ -537,7 +547,7 @@ public class JoinTaskExecutorTests extends ESTestCase {
     public void testDesiredNodesMembershipIsUpgradedWhenANewMasterIsElected() throws Exception {
         final var allocationService = createAllocationService();
         final RerouteService rerouteService = (reason, priority, listener) -> listener.onResponse(null);
-        final var joinTaskExecutor = new JoinTaskExecutor(allocationService, rerouteService);
+        final var executor = new NodeJoinExecutor(allocationService, rerouteService);
 
         final var actualizedDesiredNodes = randomList(1, 5, this::createPendingDesiredNode);
         final var pendingDesiredNodes = randomList(0, 5, this::createPendingDesiredNode);
@@ -552,13 +562,13 @@ public class JoinTaskExecutorTests extends ESTestCase {
         final var desiredNodes = DesiredNodes.latestFromClusterState(clusterState);
 
         final var completingElectionTask = JoinTask.completingElection(
-            clusterState.nodes().stream().map(node -> new JoinTask.NodeJoinTask(node, "test", NOT_COMPLETED_LISTENER)),
+            clusterState.nodes().stream().map(node -> new JoinTask.NodeJoinTask(node, TEST_REASON, NOT_COMPLETED_LISTENER)),
             1L
         );
 
         final var updatedClusterState = ClusterStateTaskExecutorUtils.executeAndAssertSuccessful(
             clusterState,
-            joinTaskExecutor,
+            executor,
             List.of(completingElectionTask)
         );
 
@@ -573,6 +583,50 @@ public class JoinTaskExecutorTests extends ESTestCase {
         );
     }
 
+    public void testPerNodeLogging() {
+        final AllocationService allocationService = createAllocationService();
+        when(allocationService.adaptAutoExpandReplicas(any())).then(invocationOnMock -> invocationOnMock.getArguments()[0]);
+        final RerouteService rerouteService = (reason, priority, listener) -> listener.onResponse(null);
+
+        final NodeJoinExecutor executor = new NodeJoinExecutor(allocationService, rerouteService);
+
+        final DiscoveryNode masterNode = new DiscoveryNode(UUIDs.base64UUID(), buildNewFakeTransportAddress(), Version.CURRENT);
+        final ClusterState clusterState = ClusterState.builder(ClusterName.DEFAULT)
+            .nodes(DiscoveryNodes.builder().add(masterNode).localNodeId(masterNode.getId()).masterNodeId(masterNode.getId()))
+            .build();
+
+        final MockLogAppender appender = new MockLogAppender();
+        final ThreadPool threadPool = new TestThreadPool("test");
+        try (
+            var ignored = appender.capturing(NodeJoinExecutor.class);
+            var clusterService = ClusterServiceUtils.createClusterService(clusterState, threadPool)
+        ) {
+            final var node1 = new DiscoveryNode(UUIDs.base64UUID(), buildNewFakeTransportAddress(), Version.CURRENT);
+            appender.addExpectation(
+                new MockLogAppender.SeenEventExpectation(
+                    "info message",
+                    LOGGER_NAME,
+                    Level.INFO,
+                    "node-join: [" + node1.descriptionWithoutAttributes() + "] with reason [" + TEST_REASON + "]"
+                )
+            );
+            assertNull(
+                PlainActionFuture.<Void, RuntimeException>get(
+                    future -> clusterService.getMasterService()
+                        .submitStateUpdateTask(
+                            "test",
+                            JoinTask.singleNode(node1, TEST_REASON, future, 0L),
+                            ClusterStateTaskConfig.build(Priority.NORMAL),
+                            executor
+                        )
+                )
+            );
+            appender.assertAllExpectationsMatched();
+        } finally {
+            TestThreadPool.terminate(threadPool, 10, TimeUnit.SECONDS);
+        }
+    }
+
     private DesiredNodeWithStatus createActualizedDesiredNode() {
         return new DesiredNodeWithStatus(randomDesiredNode(), DesiredNodeWithStatus.Status.ACTUALIZED);
     }
@@ -581,10 +635,10 @@ public class JoinTaskExecutorTests extends ESTestCase {
         return new DesiredNodeWithStatus(randomDesiredNode(), DesiredNodeWithStatus.Status.PENDING);
     }
 
-    private static JoinTask createRandomTask(DiscoveryNode node, String reason, long term) {
+    private static JoinTask createRandomTask(DiscoveryNode node, long term) {
         return randomBoolean()
-            ? JoinTask.singleNode(node, reason, NOT_COMPLETED_LISTENER, term)
-            : JoinTask.completingElection(Stream.of(new JoinTask.NodeJoinTask(node, reason, NOT_COMPLETED_LISTENER)), term);
+            ? JoinTask.singleNode(node, TEST_REASON, NOT_COMPLETED_LISTENER, term)
+            : JoinTask.completingElection(Stream.of(new JoinTask.NodeJoinTask(node, TEST_REASON, NOT_COMPLETED_LISTENER)), term);
     }
 
     private static AllocationService createAllocationService() {
@@ -595,4 +649,8 @@ public class JoinTaskExecutorTests extends ESTestCase {
         );
         return allocationService;
     }
+
+    // Hard-coding the class name here because it is also mentioned in the troubleshooting docs, so should not be renamed without care.
+    private static final String LOGGER_NAME = "org.elasticsearch.cluster.coordination.NodeJoinExecutor";
+
 }

+ 71 - 9
server/src/test/java/org/elasticsearch/cluster/coordination/NodeRemovalClusterStateTaskExecutorTests.java → server/src/test/java/org/elasticsearch/cluster/coordination/NodeLeftExecutorTests.java

@@ -8,17 +8,26 @@
 
 package org.elasticsearch.cluster.coordination;
 
+import org.apache.logging.log4j.Level;
 import org.elasticsearch.Version;
+import org.elasticsearch.action.support.PlainActionFuture;
 import org.elasticsearch.cluster.ClusterName;
 import org.elasticsearch.cluster.ClusterState;
+import org.elasticsearch.cluster.ClusterStateTaskConfig;
 import org.elasticsearch.cluster.node.DiscoveryNode;
 import org.elasticsearch.cluster.node.DiscoveryNodes;
 import org.elasticsearch.cluster.routing.allocation.AllocationService;
 import org.elasticsearch.cluster.service.ClusterStateTaskExecutorUtils;
+import org.elasticsearch.common.Priority;
+import org.elasticsearch.test.ClusterServiceUtils;
 import org.elasticsearch.test.ESTestCase;
+import org.elasticsearch.test.MockLogAppender;
+import org.elasticsearch.threadpool.TestThreadPool;
+import org.elasticsearch.threadpool.ThreadPool;
 
 import java.util.ArrayList;
 import java.util.List;
+import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicReference;
 
 import static org.mockito.ArgumentMatchers.any;
@@ -27,10 +36,10 @@ import static org.mockito.Mockito.mock;
 import static org.mockito.Mockito.verify;
 import static org.mockito.Mockito.when;
 
-public class NodeRemovalClusterStateTaskExecutorTests extends ESTestCase {
+public class NodeLeftExecutorTests extends ESTestCase {
 
     public void testRemovingNonExistentNodes() throws Exception {
-        final NodeRemovalClusterStateTaskExecutor executor = new NodeRemovalClusterStateTaskExecutor(null);
+        final NodeLeftExecutor executor = new NodeLeftExecutor(null);
         final DiscoveryNodes.Builder builder = DiscoveryNodes.builder();
         final int nodes = randomIntBetween(2, 16);
         for (int i = 0; i < nodes; i++) {
@@ -42,9 +51,9 @@ public class NodeRemovalClusterStateTaskExecutorTests extends ESTestCase {
         for (int i = nodes; i < nodes + randomIntBetween(1, 16); i++) {
             removeBuilder.add(node(i));
         }
-        final List<NodeRemovalClusterStateTaskExecutor.Task> tasks = removeBuilder.build()
+        final List<NodeLeftExecutor.Task> tasks = removeBuilder.build()
             .stream()
-            .map(node -> new NodeRemovalClusterStateTaskExecutor.Task(node, randomBoolean() ? "left" : "failed", () -> {}))
+            .map(node -> new NodeLeftExecutor.Task(node, randomBoolean() ? "left" : "failed", () -> {}))
             .toList();
 
         assertSame(clusterState, ClusterStateTaskExecutorUtils.executeAndAssertSuccessful(clusterState, executor, tasks));
@@ -57,7 +66,7 @@ public class NodeRemovalClusterStateTaskExecutorTests extends ESTestCase {
         );
 
         final AtomicReference<ClusterState> remainingNodesClusterState = new AtomicReference<>();
-        final NodeRemovalClusterStateTaskExecutor executor = new NodeRemovalClusterStateTaskExecutor(allocationService) {
+        final NodeLeftExecutor executor = new NodeLeftExecutor(allocationService) {
             @Override
             protected ClusterState remainingNodesClusterState(ClusterState currentState, DiscoveryNodes.Builder remainingNodesBuilder) {
                 remainingNodesClusterState.set(super.remainingNodesClusterState(currentState, remainingNodesBuilder));
@@ -67,14 +76,14 @@ public class NodeRemovalClusterStateTaskExecutorTests extends ESTestCase {
 
         final DiscoveryNodes.Builder builder = DiscoveryNodes.builder();
         final int nodes = randomIntBetween(2, 16);
-        final List<NodeRemovalClusterStateTaskExecutor.Task> tasks = new ArrayList<>();
+        final List<NodeLeftExecutor.Task> tasks = new ArrayList<>();
         // to ensure that there is at least one removal
         boolean first = true;
         for (int i = 0; i < nodes; i++) {
             final DiscoveryNode node = node(i);
             builder.add(node);
             if (first || randomBoolean()) {
-                tasks.add(new NodeRemovalClusterStateTaskExecutor.Task(node, randomBoolean() ? "left" : "failed", () -> {}));
+                tasks.add(new NodeLeftExecutor.Task(node, randomBoolean() ? "left" : "failed", () -> {}));
             }
             first = false;
         }
@@ -84,13 +93,66 @@ public class NodeRemovalClusterStateTaskExecutorTests extends ESTestCase {
 
         verify(allocationService).disassociateDeadNodes(eq(remainingNodesClusterState.get()), eq(true), any(String.class));
 
-        for (final NodeRemovalClusterStateTaskExecutor.Task task : tasks) {
+        for (final NodeLeftExecutor.Task task : tasks) {
             assertNull(resultingState.nodes().get(task.node().getId()));
         }
     }
 
-    private DiscoveryNode node(final int id) {
+    public void testPerNodeLogging() {
+        final AllocationService allocationService = mock(AllocationService.class);
+        when(allocationService.disassociateDeadNodes(any(ClusterState.class), eq(true), any(String.class))).thenAnswer(
+            im -> im.getArguments()[0]
+        );
+        final var executor = new NodeLeftExecutor(allocationService);
+
+        final DiscoveryNode masterNode = new DiscoveryNode("master", buildNewFakeTransportAddress(), Version.CURRENT);
+        final ClusterState clusterState = ClusterState.builder(ClusterName.DEFAULT)
+            .nodes(
+                DiscoveryNodes.builder()
+                    .add(masterNode)
+                    .localNodeId("master")
+                    .masterNodeId("master")
+                    .add(new DiscoveryNode("other", buildNewFakeTransportAddress(), Version.CURRENT))
+            )
+            .build();
+
+        final MockLogAppender appender = new MockLogAppender();
+        final ThreadPool threadPool = new TestThreadPool("test");
+        try (
+            var ignored = appender.capturing(NodeLeftExecutor.class);
+            var clusterService = ClusterServiceUtils.createClusterService(clusterState, threadPool)
+        ) {
+            final var nodeToRemove = clusterState.nodes().get("other");
+            appender.addExpectation(
+                new MockLogAppender.SeenEventExpectation(
+                    "info message",
+                    LOGGER_NAME,
+                    Level.INFO,
+                    "node-left: [" + nodeToRemove.descriptionWithoutAttributes() + "] with reason [test reason]"
+                )
+            );
+            assertNull(
+                PlainActionFuture.<Void, RuntimeException>get(
+                    future -> clusterService.getMasterService()
+                        .submitStateUpdateTask(
+                            "test",
+                            new NodeLeftExecutor.Task(nodeToRemove, "test reason", () -> future.onResponse(null)),
+                            ClusterStateTaskConfig.build(Priority.NORMAL),
+                            executor
+                        )
+                )
+            );
+            appender.assertAllExpectationsMatched();
+        } finally {
+            TestThreadPool.terminate(threadPool, 10, TimeUnit.SECONDS);
+        }
+    }
+
+    private static DiscoveryNode node(final int id) {
         return new DiscoveryNode(Integer.toString(id), buildNewFakeTransportAddress(), Version.CURRENT);
     }
 
+    // Hard-coding the class name here because it is also mentioned in the troubleshooting docs, so should not be renamed without care.
+    private static final String LOGGER_NAME = "org.elasticsearch.cluster.coordination.NodeLeftExecutor";
+
 }

+ 12 - 10
server/src/test/java/org/elasticsearch/indices/cluster/ClusterStateChanges.java

@@ -46,8 +46,8 @@ import org.elasticsearch.cluster.action.shard.ShardStateAction.StartedShardEntry
 import org.elasticsearch.cluster.action.shard.ShardStateAction.StartedShardUpdateTask;
 import org.elasticsearch.cluster.block.ClusterBlock;
 import org.elasticsearch.cluster.coordination.JoinTask;
-import org.elasticsearch.cluster.coordination.JoinTaskExecutor;
-import org.elasticsearch.cluster.coordination.NodeRemovalClusterStateTaskExecutor;
+import org.elasticsearch.cluster.coordination.NodeJoinExecutor;
+import org.elasticsearch.cluster.coordination.NodeLeftExecutor;
 import org.elasticsearch.cluster.metadata.IndexMetadata;
 import org.elasticsearch.cluster.metadata.IndexMetadataVerifier;
 import org.elasticsearch.cluster.metadata.IndexNameExpressionResolver;
@@ -136,7 +136,7 @@ public class ClusterStateChanges {
     private final TransportClusterRerouteAction transportClusterRerouteAction;
     private final TransportCreateIndexAction transportCreateIndexAction;
 
-    private final NodeRemovalClusterStateTaskExecutor nodeRemovalExecutor;
+    private final NodeLeftExecutor nodeLeftExecutor;
 
     @SuppressWarnings("unchecked")
     public ClusterStateChanges(NamedXContentRegistry xContentRegistry, ThreadPool threadPool) {
@@ -347,7 +347,7 @@ public class ClusterStateChanges {
             EmptySystemIndices.INSTANCE
         );
 
-        nodeRemovalExecutor = new NodeRemovalClusterStateTaskExecutor(allocationService);
+        nodeLeftExecutor = new NodeLeftExecutor(allocationService);
     }
 
     private void resetMasterService() {
@@ -396,14 +396,16 @@ public class ClusterStateChanges {
         return execute(transportClusterRerouteAction, request, state);
     }
 
+    private static final String DUMMY_REASON = "dummy reason";
+
     public ClusterState addNode(ClusterState clusterState, DiscoveryNode discoveryNode) {
         return runTasks(
-            new JoinTaskExecutor(allocationService, (s, p, r) -> {}),
+            new NodeJoinExecutor(allocationService, (s, p, r) -> {}),
             clusterState,
             List.of(
                 JoinTask.singleNode(
                     discoveryNode,
-                    "dummy reason",
+                    DUMMY_REASON,
                     ActionListener.wrap(() -> { throw new AssertionError("should not complete publication"); }),
                     clusterState.term()
                 )
@@ -413,7 +415,7 @@ public class ClusterStateChanges {
 
     public ClusterState joinNodesAndBecomeMaster(ClusterState clusterState, List<DiscoveryNode> nodes) {
         return runTasks(
-            new JoinTaskExecutor(allocationService, (s, p, r) -> {}),
+            new NodeJoinExecutor(allocationService, (s, p, r) -> {}),
             clusterState,
             List.of(
                 JoinTask.completingElection(
@@ -421,7 +423,7 @@ public class ClusterStateChanges {
                         .map(
                             node -> new JoinTask.NodeJoinTask(
                                 node,
-                                "dummy reason",
+                                DUMMY_REASON,
                                 ActionListener.wrap(() -> { throw new AssertionError("should not complete publication"); })
                             )
                         ),
@@ -433,9 +435,9 @@ public class ClusterStateChanges {
 
     public ClusterState removeNodes(ClusterState clusterState, List<DiscoveryNode> nodes) {
         return runTasks(
-            nodeRemovalExecutor,
+            nodeLeftExecutor,
             clusterState,
-            nodes.stream().map(n -> new NodeRemovalClusterStateTaskExecutor.Task(n, "dummy reason", () -> {})).toList()
+            nodes.stream().map(n -> new NodeLeftExecutor.Task(n, "dummy reason", () -> {})).toList()
         );
     }