service-elasticsearch.asciidoc 3.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
  1. [[infer-service-elasticsearch]]
  2. === Elasticsearch {infer} service
  3. Creates an {infer} endpoint to perform an {infer} task with the `elasticsearch` service.
  4. [discrete]
  5. [[infer-service-elasticsearch-api-request]]
  6. ==== {api-request-title}
  7. `PUT /_inference/<task_type>/<inference_id>`
  8. [discrete]
  9. [[infer-service-elasticsearch-api-path-params]]
  10. ==== {api-path-parms-title}
  11. `<inference_id>`::
  12. (Required, string)
  13. include::inference-shared.asciidoc[tag=inference-id]
  14. `<task_type>`::
  15. (Required, string)
  16. include::inference-shared.asciidoc[tag=task-type]
  17. +
  18. --
  19. Available task types:
  20. * `rerank`,
  21. * `text_embedding`.
  22. --
  23. [discrete]
  24. [[infer-service-elasticsearch-api-request-body]]
  25. ==== {api-request-body-title}
  26. `service`::
  27. (Required, string)
  28. The type of service supported for the specified task type. In this case,
  29. `elasticsearch`.
  30. `service_settings`::
  31. (Required, object)
  32. include::inference-shared.asciidoc[tag=service-settings]
  33. +
  34. --
  35. These settings are specific to the `elasticsearch` service.
  36. --
  37. `model_id`:::
  38. (Required, string)
  39. The name of the model to use for the {infer} task.
  40. It can be the ID of either a built-in model (for example, `.multilingual-e5-small` for E5) or a text embedding model already
  41. {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
  42. `num_allocations`:::
  43. (Required, integer)
  44. The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
  45. `num_threads`:::
  46. (Required, integer)
  47. Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
  48. Must be a power of 2. Max allowed value is 32.
  49. `task_settings`::
  50. (Optional, object)
  51. include::inference-shared.asciidoc[tag=task-settings]
  52. +
  53. .`task_settings` for the `rerank` task type
  54. [%collapsible%closed]
  55. =====
  56. `return_documents`:::
  57. (Optional, Boolean)
  58. Returns the document instead of only the index. Defaults to `true`.
  59. =====
  60. [discrete]
  61. [[inference-example-elasticsearch]]
  62. ==== E5 via the `elasticsearch` service
  63. The following example shows how to create an {infer} endpoint called
  64. `my-e5-model` to perform a `text_embedding` task type.
  65. [source,console]
  66. ------------------------------------------------------------
  67. PUT _inference/text_embedding/my-e5-model
  68. {
  69. "service": "elasticsearch",
  70. "service_settings": {
  71. "num_allocations": 1,
  72. "num_threads": 1,
  73. "model_id": ".multilingual-e5-small" <1>
  74. }
  75. }
  76. ------------------------------------------------------------
  77. // TEST[skip:TBD]
  78. <1> The `model_id` must be the ID of one of the built-in E5 models.
  79. Valid values are `.multilingual-e5-small` and `.multilingual-e5-small_linux-x86_64`.
  80. For further details, refer to the {ml-docs}/ml-nlp-e5.html[E5 model documentation].
  81. [NOTE]
  82. ====
  83. You might see a 502 bad gateway error in the response when using the {kib} Console.
  84. This error usually just reflects a timeout, while the model downloads in the background.
  85. You can check the download progress in the {ml-app} UI.
  86. If using the Python client, you can set the `timeout` parameter to a higher value.
  87. ====
  88. [discrete]
  89. [[inference-example-eland]]
  90. ==== Models uploaded by Eland via the elasticsearch service
  91. The following example shows how to create an {infer} endpoint called
  92. `my-msmarco-minilm-model` to perform a `text_embedding` task type.
  93. [source,console]
  94. ------------------------------------------------------------
  95. PUT _inference/text_embedding/my-msmarco-minilm-model
  96. {
  97. "service": "elasticsearch",
  98. "service_settings": {
  99. "num_allocations": 1,
  100. "num_threads": 1,
  101. "model_id": "msmarco-MiniLM-L12-cos-v5" <1>
  102. }
  103. }
  104. ------------------------------------------------------------
  105. // TEST[skip:TBD]
  106. <1> The `model_id` must be the ID of a text embedding model which has already been
  107. {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].