Sfoglia il codice sorgente

[ML] adds new params to the start trained model deployment docs (#80016)

Benjamin Trent 4 anni fa
parent
commit
f9bf4e57b9

+ 21 - 1
docs/reference/ml/df-analytics/apis/start-trained-model-deployment.asciidoc

@@ -48,11 +48,31 @@ any node. The value `started` indicates the model has started on at least one
 node. The value `fully_allocated` indicates the deployment has started on all
 valid nodes.
 
+`model_threads`::
+(Optional, integer)
+Indicates how many threads are used when sending inference requests to
+the model. Increasing this value generally increases the throughput. Defaults to
+1.
+
+`inference_threads`::
+(Optional, integer)
+Sets the number of threads used by the inference process. This generally increases
+the inference speed. The inference process is a compute-bound process; any number 
+greater than the number of available CPU cores on the machine does not increase the 
+inference speed.
+Defaults to 1.
+
+`queue_capacity`::
+(Optional, integer)
+Controls how many inference requests are allowed in the queue at a time. Once the
+number of requests exceeds this value, new requests are rejected with a 429 error.
+Defaults to 1024.
+
 [[start-trained-model-deployment-example]]
 == {api-examples-title}
 
 The following example starts a new deployment for a
-`elastic__d`istilbert-base-uncased-finetuned-conll03-english` trained model: 
+`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model:
 
 [source,console]
 --------------------------------------------------

+ 2 - 0
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java

@@ -274,7 +274,9 @@ public class StartTrainedModelDeploymentAction extends ActionType<CreateTrainedM
 
         private final String modelId;
         private final long modelBytes;
+        // How many threads are used by the model during inference. Used to increase inference speed.
         private final int inferenceThreads;
+        // How many threads are used when forwarding the request to the model. Used to increase throughput.
         private final int modelThreads;
         private final int queueCapacity;