| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130 |
- [[infer-service-elasticsearch]]
- === Elasticsearch {infer} service
- Creates an {infer} endpoint to perform an {infer} task with the `elasticsearch` service.
- [discrete]
- [[infer-service-elasticsearch-api-request]]
- ==== {api-request-title}
- `PUT /_inference/<task_type>/<inference_id>`
- [discrete]
- [[infer-service-elasticsearch-api-path-params]]
- ==== {api-path-parms-title}
- `<inference_id>`::
- (Required, string)
- include::inference-shared.asciidoc[tag=inference-id]
- `<task_type>`::
- (Required, string)
- include::inference-shared.asciidoc[tag=task-type]
- +
- --
- Available task types:
- * `rerank`,
- * `text_embedding`.
- --
- [discrete]
- [[infer-service-elasticsearch-api-request-body]]
- ==== {api-request-body-title}
- `service`::
- (Required, string)
- The type of service supported for the specified task type. In this case,
- `elasticsearch`.
- `service_settings`::
- (Required, object)
- include::inference-shared.asciidoc[tag=service-settings]
- +
- --
- These settings are specific to the `elasticsearch` service.
- --
- `model_id`:::
- (Required, string)
- The name of the model to use for the {infer} task.
- It can be the ID of either a built-in model (for example, `.multilingual-e5-small` for E5) or a text embedding model already
- {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
- `num_allocations`:::
- (Required, integer)
- The total number of allocations this model is assigned across machine learning nodes. Increasing this value generally increases the throughput.
- `num_threads`:::
- (Required, integer)
- Sets the number of threads used by each model allocation during inference. This generally increases the speed per inference request. The inference process is a compute-bound process; `threads_per_allocations` must not exceed the number of available allocated processors per node.
- Must be a power of 2. Max allowed value is 32.
- `task_settings`::
- (Optional, object)
- include::inference-shared.asciidoc[tag=task-settings]
- +
- .`task_settings` for the `rerank` task type
- [%collapsible%closed]
- =====
- `return_documents`:::
- (Optional, Boolean)
- Returns the document instead of only the index. Defaults to `true`.
- =====
- [discrete]
- [[inference-example-elasticsearch]]
- ==== E5 via the `elasticsearch` service
- The following example shows how to create an {infer} endpoint called
- `my-e5-model` to perform a `text_embedding` task type.
- [source,console]
- ------------------------------------------------------------
- PUT _inference/text_embedding/my-e5-model
- {
- "service": "elasticsearch",
- "service_settings": {
- "num_allocations": 1,
- "num_threads": 1,
- "model_id": ".multilingual-e5-small" <1>
- }
- }
- ------------------------------------------------------------
- // TEST[skip:TBD]
- <1> The `model_id` must be the ID of one of the built-in E5 models.
- Valid values are `.multilingual-e5-small` and `.multilingual-e5-small_linux-x86_64`.
- For further details, refer to the {ml-docs}/ml-nlp-e5.html[E5 model documentation].
- [NOTE]
- ====
- You might see a 502 bad gateway error in the response when using the {kib} Console.
- This error usually just reflects a timeout, while the model downloads in the background.
- You can check the download progress in the {ml-app} UI.
- If using the Python client, you can set the `timeout` parameter to a higher value.
- ====
- [discrete]
- [[inference-example-eland]]
- ==== Models uploaded by Eland via the elasticsearch service
- The following example shows how to create an {infer} endpoint called
- `my-msmarco-minilm-model` to perform a `text_embedding` task type.
- [source,console]
- ------------------------------------------------------------
- PUT _inference/text_embedding/my-msmarco-minilm-model
- {
- "service": "elasticsearch",
- "service_settings": {
- "num_allocations": 1,
- "num_threads": 1,
- "model_id": "msmarco-MiniLM-L12-cos-v5" <1>
- }
- }
- ------------------------------------------------------------
- // TEST[skip:TBD]
- <1> The `model_id` must be the ID of a text embedding model which has already been
- {ml-docs}/ml-nlp-import-model.html#ml-nlp-import-script[uploaded through Eland].
|