Prechádzať zdrojové kódy

[ML] Load pytorch models from the utility thread pool (#91661)

This commit changes the thread pool used for loading pytorch
models from the inference pool to the utility pool. The inference
pool is sized so that there are exactly 3 threads per process:
one for input, output, and logging.

Loading the model itself should be done on the utility thread pool
as it is a short lived operation.

Relates #91422
Dimitris Athanasiou 2 rokov pred
rodič
commit
0c5edf7564

+ 3 - 1
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java

@@ -401,7 +401,9 @@ public class DeploymentManager {
                 this.numThreadsPerAllocation = threadSettings.numThreadsPerAllocation();
                 this.numAllocations = threadSettings.numAllocations();
             });
-            this.stateStreamer = new PyTorchStateStreamer(client, executorServiceForProcess, xContentRegistry);
+            // We want to use the utility thread pool to load the model and not one of the process
+            // threads that are dedicated to processing done throughout the lifetime of the process.
+            this.stateStreamer = new PyTorchStateStreamer(client, executorServiceForDeployment, xContentRegistry);
             this.priorityProcessWorker = new PriorityProcessWorkerExecutorService(
                 threadPool.getThreadContext(),
                 "inference process",