|
|
@@ -8,7 +8,7 @@ Creates an {infer} endpoint to perform an {infer} task.
|
|
|
|
|
|
IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in
|
|
|
{ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure
|
|
|
-OpenAI or Hugging Face. For built-in models and models uploaded though
|
|
|
+OpenAI, Google AI Studio or Hugging Face. For built-in models and models uploaded though
|
|
|
Eland, the {infer} APIs offer an alternative way to use and manage trained
|
|
|
models. However, if you do not plan to use the {infer} APIs to use these models
|
|
|
or if you want to use non-NLP models, use the <<ml-df-trained-models-apis>>.
|
|
|
@@ -45,6 +45,7 @@ The following services are available through the {infer} API:
|
|
|
* Azure OpenAI
|
|
|
* Azure AI Studio
|
|
|
* Elasticsearch (for built-in models and models uploaded through Eland)
|
|
|
+* Google AI Studio
|
|
|
|
|
|
|
|
|
[discrete]
|
|
|
@@ -84,6 +85,7 @@ OpenAI service.
|
|
|
* `azureaistudio`: specify the `completion` or `text_embedding` task type to use the Azure AI Studio service.
|
|
|
* `elasticsearch`: specify the `text_embedding` task type to use the E5
|
|
|
built-in model or text embedding models uploaded by Eland.
|
|
|
+* `googleaistudio`: specify the `completion` task to use the Google AI Studio service.
|
|
|
|
|
|
`service_settings`::
|
|
|
(Required, object)
|
|
|
@@ -282,6 +284,33 @@ To modify this, set the `requests_per_minute` setting of this object in your ser
|
|
|
```
|
|
|
=====
|
|
|
+
|
|
|
+.`service_settings` for the `googleiastudio` service
|
|
|
+[%collapsible%closed]
|
|
|
+=====
|
|
|
+`api_key`:::
|
|
|
+(Required, string)
|
|
|
+A valid API key for the Google Gemini API.
|
|
|
+
|
|
|
+`model_id`:::
|
|
|
+(Required, string)
|
|
|
+The name of the model to use for the {infer} task.
|
|
|
+You can find the supported models at https://ai.google.dev/gemini-api/docs/models/gemini[Gemini API models].
|
|
|
+
|
|
|
+`rate_limit`:::
|
|
|
+(Optional, object)
|
|
|
+By default, the `googleaistudio` service sets the number of requests allowed per minute to `360`.
|
|
|
+This helps to minimize the number of rate limit errors returned from Google AI Studio.
|
|
|
+To modify this, set the `requests_per_minute` setting of this object in your service settings:
|
|
|
++
|
|
|
+--
|
|
|
+```
|
|
|
+"rate_limit": {
|
|
|
+ "requests_per_minute": <<number_of_requests>>
|
|
|
+}
|
|
|
+```
|
|
|
+--
|
|
|
+=====
|
|
|
++
|
|
|
.`service_settings` for the `elasticsearch` service
|
|
|
[%collapsible%closed]
|
|
|
=====
|
|
|
@@ -304,7 +333,6 @@ exceed the number of available processors per node divided by the number of
|
|
|
allocations. Must be a power of 2. Max allowed value is 32.
|
|
|
=====
|
|
|
|
|
|
-
|
|
|
`task_settings`::
|
|
|
(Optional, object)
|
|
|
Settings to configure the {infer} task. These settings are specific to the
|
|
|
@@ -701,3 +729,23 @@ PUT _inference/completion/azure_ai_studio_completion
|
|
|
// TEST[skip:TBD]
|
|
|
|
|
|
The list of chat completion models that you can choose from in your deployment can be found in the https://ai.azure.com/explore/models?selectedTask=chat-completion[Azure AI Studio model explorer].
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[inference-example-googleaistudio]]
|
|
|
+===== Google AI Studio service
|
|
|
+
|
|
|
+The following example shows how to create an {infer} endpoint called
|
|
|
+`google_ai_studio_completion` to perform a `completion` task type.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+------------------------------------------------------------
|
|
|
+PUT _inference/completion/google_ai_studio_completion
|
|
|
+{
|
|
|
+ "service": "googleaistudio",
|
|
|
+ "service_settings": {
|
|
|
+ "api_key": "<api_key>>",
|
|
|
+ "model_id": "<model_id>"
|
|
|
+ }
|
|
|
+}
|
|
|
+------------------------------------------------------------
|
|
|
+// TEST[skip:TBD]
|