lqb
/
elasticsearch
mirror of https://gitee.com/mirrors/elasticsearch.git


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133
							[[semantic-search-api]]
=== Semantic search API
++++
<titleabbrev>Semantic search</titleabbrev>
++++

experimental::[]
Semantic search uses a text embedding NLP model to generate a dense vector from the input query string.
The resulting dense vector is then used in a <<knn-search,k-nearest neighbor (knn) search>> against an index containing dense vectors
created with the same text embedding model. The search results are semantically similar as learned
by the model.

////
[source,console]
----
PUT my-index
{
  "mappings": {
    "properties": {
      "text_embedding": {
        "type": "dense_vector",
        "dims": 512,
        "index": true,
        "similarity": "cosine"
      }
    }
  }
}
----
////

[source,console]
----
GET my-index/_semantic_search
{
  "query_string": "A picture of a snow capped mountain",
  "model_id": "my-text-embedding-model",
  "knn": {
    "field": "text_embedding",
    "k": 10,
    "num_candidates": 100
  }
}
----
// TEST[skip:TBD]


[[semantic-search-api-request]]
==== {api-request-title}

`GET <target>/_semantic_search`

`POST <target>/_semantic_search`

[[semantic-search-api-prereqs]]
==== {api-prereq-title}

* If the {es} {security-features} are enabled, you must have the `read`
<<privileges-list-indices,index privilege>> for the target data stream, index,
or alias.

[[semantic-search-api-desc]]
==== {api-description-title}

The semantic search API uses a text embedding model to create a dense vector
representation of the query string.


[[semantic-search-api-path-params]]
==== {api-path-parms-title}

`<target>`::
(Optional, string) Comma-separated list of data streams, indices, and aliases
to search. Supports wildcards (`*`). To search all data streams and indices,
use `*` or `_all`.

[role="child_attributes"]
[[semantic-search-api-query-params]]
==== {api-query-parms-title}

include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]

[role="child_attributes"]
[[semantic-search-api-request-body]]
==== {api-request-body-title}

`model_id`::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]

`query_string`::
(Required, string) The input text to embed.

`knn`::
(Required, object) Defines the kNN query to run.
+
.Properties of `knn` object
[%collapsible%open]
====
`field`::
(Required, string) The name of the vector field to search against. Must be a
<<index-vectors-knn-search, `dense_vector` field with indexing enabled>>.

`k`::
(Required, integer) Number of nearest neighbors to return as top hits. This
value must be less than `num_candidates`.

`num_candidates`::
(Required, integer) The number of nearest neighbor candidates to consider per
shard. Cannot exceed 10,000. {es} collects `num_candidates` results from each
shard, then merges them to find the top `k` results. Increasing
`num_candidates` tends to improve the accuracy of the final `k` results.
====

`filter`::
(Optional, <<query-dsl,Query DSL object>>) Query to filter the documents that
can match. The kNN search will return the top `k` documents that also match
this filter. The value can be a single query or a list of queries. If `filter`
is not provided, all documents are allowed to match.


include::{es-repo-dir}/search/search.asciidoc[tag=docvalue-fields-def]
include::{es-repo-dir}/search/search.asciidoc[tag=fields-param-def]
include::{es-repo-dir}/search/search.asciidoc[tag=source-filtering-def]
include::{es-repo-dir}/search/search.asciidoc[tag=stored-fields-def]

[role="child_attributes"]
[[semantic-search-api-response-body]]
==== {api-response-body-title}

A sementic search response has the same structure as a kNN search response.