post-inference.asciidoc 7.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292
  1. [role="xpack"]
  2. [[post-inference-api]]
  3. === Perform inference API
  4. experimental[]
  5. Performs an inference task on an input text by using an {infer} endpoint.
  6. IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in
  7. {ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI or
  8. Hugging Face. For built-in models and models uploaded through Eland, the {infer}
  9. APIs offer an alternative way to use and manage trained models. However, if you
  10. do not plan to use the {infer} APIs to use these models or if you want to use
  11. non-NLP models, use the <<ml-df-trained-models-apis>>.
  12. [discrete]
  13. [[post-inference-api-request]]
  14. ==== {api-request-title}
  15. `POST /_inference/<inference_id>`
  16. `POST /_inference/<task_type>/<inference_id>`
  17. [discrete]
  18. [[post-inference-api-prereqs]]
  19. ==== {api-prereq-title}
  20. * Requires the `monitor_inference` <<privileges-list-cluster,cluster privilege>>
  21. (the built-in `inference_admin` and `inference_user` roles grant this privilege)
  22. [discrete]
  23. [[post-inference-api-desc]]
  24. ==== {api-description-title}
  25. The perform {infer} API enables you to use {ml} models to perform specific tasks
  26. on data that you provide as an input. The API returns a response with the
  27. results of the tasks. The {infer} endpoint you use can perform one specific task
  28. that has been defined when the endpoint was created with the
  29. <<put-inference-api>>.
  30. [discrete]
  31. [[post-inference-api-path-params]]
  32. ==== {api-path-parms-title}
  33. `<inference_id>`::
  34. (Required, string)
  35. The unique identifier of the {infer} endpoint.
  36. `<task_type>`::
  37. (Optional, string)
  38. The type of {infer} task that the model performs.
  39. [discrete]
  40. [[post-inference-api-query-params]]
  41. ==== {api-query-parms-title}
  42. `timeout`::
  43. (Optional, timeout)
  44. Controls the amount of time to wait for the inference to complete. Defaults to 30
  45. seconds.
  46. [discrete]
  47. [[post-inference-api-request-body]]
  48. ==== {api-request-body-title}
  49. `input`::
  50. (Required, string or array of strings)
  51. The text on which you want to perform the {infer} task.
  52. `input` can be a single string or an array.
  53. +
  54. --
  55. [NOTE]
  56. ====
  57. Inference endpoints for the `completion` task type currently only support a
  58. single string as input.
  59. ====
  60. --
  61. `query`::
  62. (Required, string)
  63. Only for `rerank` {infer} endpoints. The search query text.
  64. `task_settings`::
  65. (Optional, object)
  66. Task settings for the individual {infer} request.
  67. These settings are specific to the `<task_type>` you specified and override the task settings specified when initializing the service.
  68. [discrete]
  69. [[post-inference-api-example]]
  70. ==== {api-examples-title}
  71. [discrete]
  72. [[inference-example-completion]]
  73. ===== Completion example
  74. The following example performs a completion on the example question.
  75. [source,console]
  76. ------------------------------------------------------------
  77. POST _inference/completion/openai_chat_completions
  78. {
  79. "input": "What is Elastic?"
  80. }
  81. ------------------------------------------------------------
  82. // TEST[skip:TBD]
  83. The API returns the following response:
  84. [source,console-result]
  85. ------------------------------------------------------------
  86. {
  87. "completion": [
  88. {
  89. "result": "Elastic is a company that provides a range of software solutions for search, logging, security, and analytics. Their flagship product is Elasticsearch, an open-source, distributed search engine that allows users to search, analyze, and visualize large volumes of data in real-time. Elastic also offers products such as Kibana, a data visualization tool, and Logstash, a log management and pipeline tool, as well as various other tools and solutions for data analysis and management."
  90. }
  91. ]
  92. }
  93. ------------------------------------------------------------
  94. // NOTCONSOLE
  95. [discrete]
  96. [[inference-example-rerank]]
  97. ===== Rerank example
  98. The following example performs reranking on the example input.
  99. [source,console]
  100. ------------------------------------------------------------
  101. POST _inference/rerank/cohere_rerank
  102. {
  103. "input": ["luke", "like", "leia", "chewy","r2d2", "star", "wars"],
  104. "query": "star wars main character"
  105. }
  106. ------------------------------------------------------------
  107. // TEST[skip:TBD]
  108. The API returns the following response:
  109. [source,console-result]
  110. ------------------------------------------------------------
  111. {
  112. "rerank": [
  113. {
  114. "index": "2",
  115. "relevance_score": "0.011597361",
  116. "text": "leia"
  117. },
  118. {
  119. "index": "0",
  120. "relevance_score": "0.006338922",
  121. "text": "luke"
  122. },
  123. {
  124. "index": "5",
  125. "relevance_score": "0.0016166499",
  126. "text": "star"
  127. },
  128. {
  129. "index": "4",
  130. "relevance_score": "0.0011695103",
  131. "text": "r2d2"
  132. },
  133. {
  134. "index": "1",
  135. "relevance_score": "5.614787E-4",
  136. "text": "like"
  137. },
  138. {
  139. "index": "6",
  140. "relevance_score": "3.7850367E-4",
  141. "text": "wars"
  142. },
  143. {
  144. "index": "3",
  145. "relevance_score": "1.2508839E-5",
  146. "text": "chewy"
  147. }
  148. ]
  149. }
  150. ------------------------------------------------------------
  151. [discrete]
  152. [[inference-example-sparse]]
  153. ===== Sparse embedding example
  154. The following example performs sparse embedding on the example sentence.
  155. [source,console]
  156. ------------------------------------------------------------
  157. POST _inference/sparse_embedding/my-elser-model
  158. {
  159. "input": "The sky above the port was the color of television tuned to a dead channel."
  160. }
  161. ------------------------------------------------------------
  162. // TEST[skip:TBD]
  163. The API returns the following response:
  164. [source,console-result]
  165. ------------------------------------------------------------
  166. {
  167. "sparse_embedding": [
  168. {
  169. "port": 2.1259406,
  170. "sky": 1.7073475,
  171. "color": 1.6922266,
  172. "dead": 1.6247464,
  173. "television": 1.3525393,
  174. "above": 1.2425821,
  175. "tuned": 1.1440028,
  176. "colors": 1.1218185,
  177. "tv": 1.0111054,
  178. "ports": 1.0067928,
  179. "poem": 1.0042328,
  180. "channel": 0.99471164,
  181. "tune": 0.96235967,
  182. "scene": 0.9020516,
  183. (...)
  184. },
  185. (...)
  186. ]
  187. }
  188. ------------------------------------------------------------
  189. // NOTCONSOLE
  190. [discrete]
  191. [[inference-example-text-embedding]]
  192. ===== Text embedding example
  193. The following example performs text embedding on the example sentence using the Cohere integration.
  194. [source,console]
  195. ------------------------------------------------------------
  196. POST _inference/text_embedding/my-cohere-endpoint
  197. {
  198. "input": "The sky above the port was the color of television tuned to a dead channel.",
  199. "task_settings": {
  200. "input_type": "ingest"
  201. }
  202. }
  203. ------------------------------------------------------------
  204. // TEST[skip:TBD]
  205. The API returns the following response:
  206. [source,console-result]
  207. ------------------------------------------------------------
  208. {
  209. "text_embedding": [
  210. {
  211. "embedding": [
  212. {
  213. 0.018569946,
  214. -0.036895752,
  215. 0.01486969,
  216. -0.0045204163,
  217. -0.04385376,
  218. 0.0075950623,
  219. 0.04260254,
  220. -0.004005432,
  221. 0.007865906,
  222. 0.030792236,
  223. -0.050476074,
  224. 0.011795044,
  225. -0.011642456,
  226. -0.010070801,
  227. (...)
  228. },
  229. (...)
  230. ]
  231. }
  232. ]
  233. }
  234. ------------------------------------------------------------
  235. // NOTCONSOLE