1
0
Эх сурвалжийг харах

Fix Deadlock in TcpTransport#openConnection (#78073)

This seems to only affect `MockNioTransport` but I think it's a clean fix regardless.
We should not be blockingly locking on the transport thread, otherwise if the close operation
on the transport waits for all io loops to complete (like `MockNioTransport` does)
we will deadlock if closing and opening a connection coincide.

Co-authored-by: David Turner <david.turner@elastic.co>
Armin Braun 4 жил өмнө
parent
commit
064ab2f6c3

+ 9 - 2
server/src/main/java/org/elasticsearch/transport/TcpTransport.java

@@ -272,7 +272,11 @@ public abstract class TcpTransport extends AbstractLifecycleComponent implements
             throw new ConnectTransportException(null, "can't open connection to a null node");
         }
         ConnectionProfile finalProfile = maybeOverrideConnectionProfile(profile);
-        closeLock.readLock().lock(); // ensure we don't open connections while we are closing
+        if (closeLock.readLock().tryLock() == false) {
+            ensureOpen();
+            assert false : "should not get here ever because close-write-lock should only be held on shutdown";
+            throw new ConnectTransportException(node, "failed to acquire close-read-lock");
+        }
         try {
             ensureOpen();
             initiateConnection(node, finalProfile, listener);
@@ -379,7 +383,10 @@ public abstract class TcpTransport extends AbstractLifecycleComponent implements
         PortsRange portsRange = new PortsRange(port);
         final AtomicReference<Exception> lastException = new AtomicReference<>();
         final AtomicReference<InetSocketAddress> boundSocket = new AtomicReference<>();
-        closeLock.writeLock().lock();
+        if (closeLock.writeLock().tryLock() == false) {
+            assert false; // can't be concurrently stopping and mustn't be opening any connections yet
+            throw new IllegalStateException("failed to acquire close-write-lock");
+        }
         try {
             // No need for locking here since Lifecycle objects can't move from STARTED to INITIALIZED
             if (lifecycle.initialized() == false && lifecycle.started() == false) {