service-watsonx-ai.asciidoc 5.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173
  1. [[infer-service-watsonx-ai]]
  2. === Watsonx {infer} integration
  3. .New API reference
  4. [sidebar]
  5. --
  6. For the most up-to-date API details, refer to {api-es}/group/endpoint-inference[{infer-cap} APIs].
  7. --
  8. Creates an {infer} endpoint to perform an {infer} task with the `watsonxai` service.
  9. You need an https://cloud.ibm.com/docs/databases-for-elasticsearch?topic=databases-for-elasticsearch-provisioning&interface=api[IBM Cloud® Databases for Elasticsearch deployment] to use the `watsonxai` {infer} service.
  10. You can provision one through the https://cloud.ibm.com/databases/databases-for-elasticsearch/create[IBM catalog], the https://cloud.ibm.com/docs/databases-cli-plugin?topic=databases-cli-plugin-cdb-reference[Cloud Databases CLI plug-in], the https://cloud.ibm.com/apidocs/cloud-databases-api[Cloud Databases API], or https://registry.terraform.io/providers/IBM-Cloud/ibm/latest/docs/resources/database[Terraform].
  11. [discrete]
  12. [[infer-service-watsonx-ai-api-request]]
  13. ==== {api-request-title}
  14. `PUT /_inference/<task_type>/<inference_id>`
  15. [discrete]
  16. [[infer-service-watsonx-ai-api-path-params]]
  17. ==== {api-path-parms-title}
  18. `<inference_id>`::
  19. (Required, string)
  20. include::inference-shared.asciidoc[tag=inference-id]
  21. `<task_type>`::
  22. (Required, string)
  23. include::inference-shared.asciidoc[tag=task-type]
  24. +
  25. --
  26. Available task types:
  27. * `text_embedding`,
  28. * `rerank`.
  29. --
  30. [discrete]
  31. [[infer-service-watsonx-ai-api-request-body]]
  32. ==== {api-request-body-title}
  33. `service`::
  34. (Required, string)
  35. The type of service supported for the specified task type. In this case,
  36. `watsonxai`.
  37. `service_settings`::
  38. (Required, object)
  39. include::inference-shared.asciidoc[tag=service-settings]
  40. +
  41. --
  42. These settings are specific to the `watsonxai` service.
  43. --
  44. `api_key`:::
  45. (Required, string)
  46. A valid API key of your Watsonx account.
  47. You can find your Watsonx API keys or you can create a new one https://cloud.ibm.com/iam/apikeys[on the API keys page].
  48. +
  49. --
  50. include::inference-shared.asciidoc[tag=api-key-admonition]
  51. --
  52. `api_version`:::
  53. (Required, string)
  54. Version parameter that takes a version date in the format of `YYYY-MM-DD`.
  55. For the active version data parameters, refer to the https://cloud.ibm.com/apidocs/watsonx-ai#active-version-dates[documentation].
  56. `model_id`:::
  57. (Required, string)
  58. The name of the model to use for the {infer} task.
  59. Refer to the IBM Embedding Models section in the https://www.ibm.com/products/watsonx-ai/foundation-models[Watsonx documentation] for the list of available text embedding models.
  60. `url`:::
  61. (Required, string)
  62. The URL endpoint to use for the requests.
  63. `project_id`:::
  64. (Required, string)
  65. The name of the project to use for the {infer} task.
  66. `rate_limit`:::
  67. (Optional, object)
  68. By default, the `watsonxai` service sets the number of requests allowed per minute to `120`.
  69. This helps to minimize the number of rate limit errors returned from Watsonx.
  70. To modify this, set the `requests_per_minute` setting of this object in your service settings:
  71. +
  72. --
  73. include::inference-shared.asciidoc[tag=request-per-minute-example]
  74. --
  75. `task_settings`::
  76. (Optional, object)
  77. include::inference-shared.asciidoc[tag=task-settings]
  78. +
  79. .`task_settings` for the `rerank` task type
  80. [%collapsible%closed]
  81. =====
  82. `truncate_input_tokens`:::
  83. (Optional, integer)
  84. Specifies the maximum number of tokens per input document before truncation.
  85. `return_documents`:::
  86. (Optional, boolean)
  87. Specify whether to return doc text within the results.
  88. `top_n`:::
  89. (Optional, integer)
  90. The number of most relevant documents to return. Defaults to the number of input documents.
  91. =====
  92. [discrete]
  93. [[inference-example-watsonx-ai]]
  94. ==== Watsonx AI service example
  95. The following example shows how to create an {infer} endpoint called `watsonx-embeddings` to perform a `text_embedding` task type.
  96. [source,console]
  97. ------------------------------------------------------------
  98. PUT _inference/text_embedding/watsonx-embeddings
  99. {
  100. "service": "watsonxai",
  101. "service_settings": {
  102. "api_key": "<api_key>", <1>
  103. "url": "<url>", <2>
  104. "model_id": "ibm/slate-30m-english-rtrvr",
  105. "project_id": "<project_id>", <3>
  106. "api_version": "2024-03-14" <4>
  107. }
  108. }
  109. ------------------------------------------------------------
  110. // TEST[skip:TBD]
  111. <1> A valid Watsonx API key.
  112. You can find on the https://cloud.ibm.com/iam/apikeys[API keys page of your account].
  113. <2> The {infer} endpoint URL you created on Watsonx.
  114. <3> The ID of your IBM Cloud project.
  115. <4> A valid API version parameter. You can find the active version data parameters https://cloud.ibm.com/apidocs/watsonx-ai#active-version-dates[here].
  116. The following example shows how to create an {infer} endpoint called `watsonx-rerank` to perform a `rerank` task type.
  117. [source,console]
  118. ------------------------------------------------------------
  119. PUT _inference/rerank/watsonx-rerank
  120. {
  121. "service": "watsonxai",
  122. "service_settings": {
  123. "api_key": "<api_key>", <1>
  124. "url": "<url>", <2>
  125. "model_id": "cross-encoder/ms-marco-minilm-l-12-v2",
  126. "project_id": "<project_id>", <3>
  127. "api_version": "2024-05-02" <4>
  128. },
  129. "task_settings": {
  130. "truncate_input_tokens": 50, <5>
  131. "return_documents": true, <6>
  132. "top_n": 3 <7>
  133. }
  134. }
  135. ------------------------------------------------------------
  136. // TEST[skip:TBD]
  137. <1> A valid Watsonx API key.
  138. You can find on the https://cloud.ibm.com/iam/apikeys[API keys page of your account].
  139. <2> The {infer} endpoint URL you created on Watsonx.
  140. <3> The ID of your IBM Cloud project.
  141. <4> A valid API version parameter. You can find the active version data parameters https://cloud.ibm.com/apidocs/watsonx-ai#active-version-dates[here].
  142. <5> The maximum number of tokens per document before truncation.
  143. <6> Whether to return the document text in the results.
  144. <7> The number of top relevant documents to return.