| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292 |
- [role="xpack"]
- [[post-inference-api]]
- === Perform inference API
- experimental[]
- Performs an inference task on an input text by using an {infer} endpoint.
- IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in
- {ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI or
- Hugging Face. For built-in models and models uploaded through Eland, the {infer}
- APIs offer an alternative way to use and manage trained models. However, if you
- do not plan to use the {infer} APIs to use these models or if you want to use
- non-NLP models, use the <<ml-df-trained-models-apis>>.
- [discrete]
- [[post-inference-api-request]]
- ==== {api-request-title}
- `POST /_inference/<inference_id>`
- `POST /_inference/<task_type>/<inference_id>`
- [discrete]
- [[post-inference-api-prereqs]]
- ==== {api-prereq-title}
- * Requires the `monitor_inference` <<privileges-list-cluster,cluster privilege>>
- (the built-in `inference_admin` and `inference_user` roles grant this privilege)
- [discrete]
- [[post-inference-api-desc]]
- ==== {api-description-title}
- The perform {infer} API enables you to use {ml} models to perform specific tasks
- on data that you provide as an input. The API returns a response with the
- results of the tasks. The {infer} endpoint you use can perform one specific task
- that has been defined when the endpoint was created with the
- <<put-inference-api>>.
- [discrete]
- [[post-inference-api-path-params]]
- ==== {api-path-parms-title}
- `<inference_id>`::
- (Required, string)
- The unique identifier of the {infer} endpoint.
- `<task_type>`::
- (Optional, string)
- The type of {infer} task that the model performs.
- [discrete]
- [[post-inference-api-query-params]]
- ==== {api-query-parms-title}
- `timeout`::
- (Optional, timeout)
- Controls the amount of time to wait for the inference to complete. Defaults to 30
- seconds.
- [discrete]
- [[post-inference-api-request-body]]
- ==== {api-request-body-title}
- `input`::
- (Required, string or array of strings)
- The text on which you want to perform the {infer} task.
- `input` can be a single string or an array.
- +
- --
- [NOTE]
- ====
- Inference endpoints for the `completion` task type currently only support a
- single string as input.
- ====
- --
- `query`::
- (Required, string)
- Only for `rerank` {infer} endpoints. The search query text.
- `task_settings`::
- (Optional, object)
- Task settings for the individual {infer} request.
- These settings are specific to the `<task_type>` you specified and override the task settings specified when initializing the service.
- [discrete]
- [[post-inference-api-example]]
- ==== {api-examples-title}
- [discrete]
- [[inference-example-completion]]
- ===== Completion example
- The following example performs a completion on the example question.
- [source,console]
- ------------------------------------------------------------
- POST _inference/completion/openai_chat_completions
- {
- "input": "What is Elastic?"
- }
- ------------------------------------------------------------
- // TEST[skip:TBD]
- The API returns the following response:
- [source,console-result]
- ------------------------------------------------------------
- {
- "completion": [
- {
- "result": "Elastic is a company that provides a range of software solutions for search, logging, security, and analytics. Their flagship product is Elasticsearch, an open-source, distributed search engine that allows users to search, analyze, and visualize large volumes of data in real-time. Elastic also offers products such as Kibana, a data visualization tool, and Logstash, a log management and pipeline tool, as well as various other tools and solutions for data analysis and management."
- }
- ]
- }
- ------------------------------------------------------------
- // NOTCONSOLE
- [discrete]
- [[inference-example-rerank]]
- ===== Rerank example
- The following example performs reranking on the example input.
- [source,console]
- ------------------------------------------------------------
- POST _inference/rerank/cohere_rerank
- {
- "input": ["luke", "like", "leia", "chewy","r2d2", "star", "wars"],
- "query": "star wars main character"
- }
- ------------------------------------------------------------
- // TEST[skip:TBD]
- The API returns the following response:
- [source,console-result]
- ------------------------------------------------------------
- {
- "rerank": [
- {
- "index": "2",
- "relevance_score": "0.011597361",
- "text": "leia"
- },
- {
- "index": "0",
- "relevance_score": "0.006338922",
- "text": "luke"
- },
- {
- "index": "5",
- "relevance_score": "0.0016166499",
- "text": "star"
- },
- {
- "index": "4",
- "relevance_score": "0.0011695103",
- "text": "r2d2"
- },
- {
- "index": "1",
- "relevance_score": "5.614787E-4",
- "text": "like"
- },
- {
- "index": "6",
- "relevance_score": "3.7850367E-4",
- "text": "wars"
- },
- {
- "index": "3",
- "relevance_score": "1.2508839E-5",
- "text": "chewy"
- }
- ]
- }
- ------------------------------------------------------------
- [discrete]
- [[inference-example-sparse]]
- ===== Sparse embedding example
- The following example performs sparse embedding on the example sentence.
- [source,console]
- ------------------------------------------------------------
- POST _inference/sparse_embedding/my-elser-model
- {
- "input": "The sky above the port was the color of television tuned to a dead channel."
- }
- ------------------------------------------------------------
- // TEST[skip:TBD]
- The API returns the following response:
- [source,console-result]
- ------------------------------------------------------------
- {
- "sparse_embedding": [
- {
- "port": 2.1259406,
- "sky": 1.7073475,
- "color": 1.6922266,
- "dead": 1.6247464,
- "television": 1.3525393,
- "above": 1.2425821,
- "tuned": 1.1440028,
- "colors": 1.1218185,
- "tv": 1.0111054,
- "ports": 1.0067928,
- "poem": 1.0042328,
- "channel": 0.99471164,
- "tune": 0.96235967,
- "scene": 0.9020516,
- (...)
- },
- (...)
- ]
- }
- ------------------------------------------------------------
- // NOTCONSOLE
- [discrete]
- [[inference-example-text-embedding]]
- ===== Text embedding example
- The following example performs text embedding on the example sentence using the Cohere integration.
- [source,console]
- ------------------------------------------------------------
- POST _inference/text_embedding/my-cohere-endpoint
- {
- "input": "The sky above the port was the color of television tuned to a dead channel.",
- "task_settings": {
- "input_type": "ingest"
- }
- }
- ------------------------------------------------------------
- // TEST[skip:TBD]
- The API returns the following response:
- [source,console-result]
- ------------------------------------------------------------
- {
- "text_embedding": [
- {
- "embedding": [
- {
- 0.018569946,
- -0.036895752,
- 0.01486969,
- -0.0045204163,
- -0.04385376,
- 0.0075950623,
- 0.04260254,
- -0.004005432,
- 0.007865906,
- 0.030792236,
- -0.050476074,
- 0.011795044,
- -0.011642456,
- -0.010070801,
- (...)
- },
- (...)
- ]
- }
- ]
- }
- ------------------------------------------------------------
- // NOTCONSOLE
|