start-trained-model-deployment.asciidoc 3.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
  1. [role="xpack"]
  2. [[start-trained-model-deployment]]
  3. = Start trained model deployment API
  4. [subs="attributes"]
  5. ++++
  6. <titleabbrev>Start trained model deployment</titleabbrev>
  7. ++++
  8. experimental::[]
  9. Starts a new trained model deployment.
  10. [[start-trained-model-deployment-request]]
  11. == {api-request-title}
  12. `POST _ml/trained_models/<model_id>/deployment/_start`
  13. [[start-trained-model-deployment-prereq]]
  14. == {api-prereq-title}
  15. Requires the `manage_ml` cluster privilege. This privilege is included in the
  16. `machine_learning_admin` built-in role.
  17. [[start-trained-model-deployment-desc]]
  18. == {api-description-title}
  19. Currently only `pytorch` models are supported for deployment. When deployed,
  20. the model attempts allocation to every machine learning node.
  21. [[start-trained-model-deployment-path-params]]
  22. == {api-path-parms-title}
  23. `<model_id>`::
  24. (Required, string)
  25. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]
  26. [[start-trained-model-deployment-query-params]]
  27. == {api-query-parms-title}
  28. `inference_threads`::
  29. (Optional, integer)
  30. Sets the number of threads used by the inference process. This generally increases
  31. the inference speed. The inference process is a compute-bound process; any number
  32. greater than the number of available CPU cores on the machine does not increase the
  33. inference speed.
  34. Defaults to 1.
  35. `model_threads`::
  36. (Optional, integer)
  37. Indicates how many threads are used when sending inference requests to
  38. the model. Increasing this value generally increases the throughput. Defaults to
  39. 1.
  40. `queue_capacity`::
  41. (Optional, integer)
  42. Controls how many inference requests are allowed in the queue at a time. Once the
  43. number of requests exceeds this value, new requests are rejected with a 429 error.
  44. Defaults to 1024.
  45. `timeout`::
  46. (Optional, time)
  47. Controls the amount of time to wait for the model to deploy. Defaults
  48. to 20 seconds.
  49. `wait_for`::
  50. (Optional, string)
  51. Specifies the allocation status to wait for before returning. Defaults to
  52. `started`. The value `starting` indicates deployment is starting but not yet on
  53. any node. The value `started` indicates the model has started on at least one
  54. node. The value `fully_allocated` indicates the deployment has started on all
  55. valid nodes.
  56. [[start-trained-model-deployment-example]]
  57. == {api-examples-title}
  58. The following example starts a new deployment for a
  59. `elastic__distilbert-base-uncased-finetuned-conll03-english` trained model:
  60. [source,console]
  61. --------------------------------------------------
  62. POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_start?wait_for=started&timeout=1m
  63. --------------------------------------------------
  64. // TEST[skip:TBD]
  65. The API returns the following results:
  66. [source,console-result]
  67. ----
  68. {
  69. "allocation": {
  70. "task_parameters": {
  71. "model_id": "elastic__distilbert-base-uncased-finetuned-conll03-english",
  72. "model_bytes": 265632637
  73. },
  74. "routing_table": {
  75. "uckeG3R8TLe2MMNBQ6AGrw": {
  76. "routing_state": "started",
  77. "reason": ""
  78. }
  79. },
  80. "allocation_state": "started",
  81. "start_time": "2021-11-02T11:50:34.766591Z"
  82. }
  83. }
  84. ----