浏览代码

[ML] fixing potential network thread lockup on Pytorch model load (#75784)

After the model is loaded via the search request, that searching thread MAY be a network thread.

Consequently, we should not lock it up while creating the named pipes. This commit addresses
by forcing named pipe creation to be done within the ML_UTILITY executor threadpool.
Benjamin Trent 4 年之前
父节点
当前提交
52de9d169b

+ 5 - 2
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java

@@ -78,7 +78,7 @@ public class DeploymentManager {
     }
 
     public void startDeployment(TrainedModelDeploymentTask task) {
-        executorServiceForDeployment.execute(() -> doStartDeployment(task));
+        doStartDeployment(task);
     }
 
     private void doStartDeployment(TrainedModelDeploymentTask task) {
@@ -118,7 +118,10 @@ public class DeploymentManager {
                 NlpTask nlpTask = NlpTask.fromConfig(config);
                 NlpTask.Processor processor = nlpTask.createProcessor();
                 processContext.nlpTaskProcessor.set(processor);
-                startAndLoad(task, processContext, modelLoadedListener);
+                // here, we are being called back on the searching thread, which MAY be a network thread
+                // `startAndLoad` creates named pipes, blocking the calling thread, better to execute that in our utility
+                // executor.
+                executorServiceForDeployment.execute(() -> startAndLoad(task, processContext, modelLoadedListener));
             },
             e -> failTask(task,
                 String.format(Locale.ROOT, "[%s] creating NLP task from configuration failed with error [%s]", task.getModelId(), e))