start-trained-model-deployment.asciidoc 4.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
  1. [role="xpack"]
  2. [[start-trained-model-deployment]]
  3. = Start trained model deployment API
  4. [subs="attributes"]
  5. ++++
  6. <titleabbrev>Start trained model deployment</titleabbrev>
  7. ++++
  8. Starts a new trained model deployment.
  9. preview::[]
  10. [[start-trained-model-deployment-request]]
  11. == {api-request-title}
  12. `POST _ml/trained_models/<model_id>/deployment/_start`
  13. [[start-trained-model-deployment-prereq]]
  14. == {api-prereq-title}
  15. Requires the `manage_ml` cluster privilege. This privilege is included in the
  16. `machine_learning_admin` built-in role.
  17. [[start-trained-model-deployment-desc]]
  18. == {api-description-title}
  19. Currently only `pytorch` models are supported for deployment. When deployed,
  20. the model attempts allocation to every machine learning node. Once deployed
  21. the model can be used by the <<inference-processor,{infer-cap} processor>>
  22. in an ingest pipeline or directly in the <<infer-trained-model>> API.
  23. [[start-trained-model-deployment-path-params]]
  24. == {api-path-parms-title}
  25. `<model_id>`::
  26. (Required, string)
  27. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]
  28. [[start-trained-model-deployment-query-params]]
  29. == {api-query-parms-title}
  30. `number_of_allocations`::
  31. (Optional, integer)
  32. The number of model allocations on each node where the model is deployed.
  33. All allocations on a node share the same copy of the model in memory but use
  34. a separate set of threads to evaluate the model.
  35. Increasing this value generally increases the throughput.
  36. If this setting is greater than the number of hardware threads
  37. it will automatically be changed to a value less than the number of hardware threads.
  38. Defaults to 1.
  39. +
  40. --
  41. [NOTE]
  42. =============================================
  43. If the sum of `threads_per_allocation` and `number_of_allocations` is greater
  44. than the number of hardware threads, the `threads_per_allocation` value is reduced.
  45. =============================================
  46. --
  47. `queue_capacity`::
  48. (Optional, integer)
  49. Controls how many inference requests are allowed in the queue at a time.
  50. Every machine learning node in the cluster where the model can be allocated
  51. has a queue of this size; when the number of requests exceeds the total value,
  52. new requests are rejected with a 429 error. Defaults to 1024.
  53. `threads_per_allocation`::
  54. (Optional, integer)
  55. Sets the number of threads used by each model allocation during inference. This generally increases
  56. the inference speed. The inference process is a compute-bound process; any number
  57. greater than the number of available hardware threads on the machine does not increase the
  58. inference speed. If this setting is greater than the number of hardware threads
  59. it will automatically be changed to a value less than the number of hardware threads.
  60. Defaults to 1.
  61. `timeout`::
  62. (Optional, time)
  63. Controls the amount of time to wait for the model to deploy. Defaults
  64. to 20 seconds.
  65. `wait_for`::
  66. (Optional, string)
  67. Specifies the allocation status to wait for before returning. Defaults to
  68. `started`. The value `starting` indicates deployment is starting but not yet on
  69. any node. The value `started` indicates the model has started on at least one
  70. node. The value `fully_allocated` indicates the deployment has started on all
  71. valid nodes.
  72. [[start-trained-model-deployment-example]]
  73. == {api-examples-title}
  74. The following example starts a new deployment for a
  75. `elastic__distilbert-base-uncased-finetuned-conll03-english` trained model:
  76. [source,console]
  77. --------------------------------------------------
  78. POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_start?wait_for=started&timeout=1m
  79. --------------------------------------------------
  80. // TEST[skip:TBD]
  81. The API returns the following results:
  82. [source,console-result]
  83. ----
  84. {
  85. "assignment": {
  86. "task_parameters": {
  87. "model_id": "elastic__distilbert-base-uncased-finetuned-conll03-english",
  88. "model_bytes": 265632637,
  89. "threads_per_allocation" : 1,
  90. "number_of_allocations" : 1,
  91. "queue_capacity" : 1024
  92. },
  93. "routing_table": {
  94. "uckeG3R8TLe2MMNBQ6AGrw": {
  95. "routing_state": "started",
  96. "reason": ""
  97. }
  98. },
  99. "assignment_state": "started",
  100. "start_time": "2022-11-02T11:50:34.766591Z"
  101. }
  102. }
  103. ----