Browse Source

Fix Test SharedClusterSnapshotRestoreIT.testDataFileFailureDuringRestore (#80515)

This is a test/assertion only issue. We were removing the tracking of a
shard restore after invoking the listener for the restore. The whole
mechanics around `onGoingRestores` though is used to wait for the
blobstore to go idle during node shutdown.
The problem with removing the tracking for the shard after resolving the
listener is that if the restore is retried very quickly due to some
reroute or so, then we have a race where it's retried before the
failed restore is removed from `onGoingRestores`.
=> fixed by just removing the tracking before resolving the listener
which is more correct anyway since we are done with the blobstore
at this point.

closes #80477
Armin Braun 3 years ago
parent
commit
ea93bdb049

+ 1 - 1
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -2980,7 +2980,7 @@ public abstract class BlobStoreRepository extends AbstractLifecycleComponent imp
             final boolean added = ongoingRestores.add(shardId);
             assert added : "add restore for [" + shardId + "] that already has an existing restore";
         }
-        executor.execute(ActionRunnable.wrap(ActionListener.runAfter(restoreListener, () -> {
+        executor.execute(ActionRunnable.wrap(ActionListener.runBefore(restoreListener, () -> {
             final List<ActionListener<Void>> onEmptyListeners;
             synchronized (ongoingRestores) {
                 if (ongoingRestores.remove(shardId) && ongoingRestores.isEmpty() && emptyListeners != null) {