浏览代码

Add docs for kNN search endpoint (#80378)

This commit adds docs for the new `_knn_search` endpoint.

It focuses on being an API reference and is light on details in terms of how
exactly the kNN search works, and how the endpoint contrasts with
`script_score` queries. We plan to add a high-level guide on kNN search that
will explain this in depth.

Relates to #78473.
Julie Tibshirani 3 年之前
父节点
当前提交
8ca693b271

+ 1 - 12
docs/reference/eql/eql-search-api.asciidoc

@@ -206,18 +206,7 @@ returned.
 +
 +
 A greater `fetch_size` value often increases search speed but uses more memory.
 A greater `fetch_size` value often increases search speed but uses more memory.
 
 
-`fields`::
-(Optional, array of strings and objects)
-Array of wildcard (`*`) patterns. The response returns values for field names
-matching these patterns in the `fields` property of each hit.
-+
-You can specify items in the array as a string or object.
-+
-.Properties of `fields` objects
-[%collapsible%open]
-====
-include::{es-repo-dir}/search/search.asciidoc[tag=fields-api-props]
-====
+include::{es-repo-dir}/search/search.asciidoc[tag=fields-param-def]
 
 
 `filter`::
 `filter`::
 (Optional, <<query-dsl,Query DSL object>>)
 (Optional, <<query-dsl,Query DSL object>>)

+ 21 - 16
docs/reference/mapping/types/dense-vector.asciidoc

@@ -5,18 +5,21 @@
 <titleabbrev>Dense vector</titleabbrev>
 <titleabbrev>Dense vector</titleabbrev>
 ++++
 ++++
 
 
-The `dense_vector` field type stores dense vectors of float values.
+The `dense_vector` field type stores dense vectors of float values. Dense
+vector fields can be used in the following ways:
 
 
-You can use `dense_vector` fields in
-<<query-dsl-script-score-query,`script_score`>> queries to score documents.
-They can also be indexed to support efficient k-nearest neighbor search. Dense
-vector fields do not support aggregations, sorting, or other query types.
+* In <<query-dsl-script-score-query,`script_score`>> queries, to score
+documents matching a filter
+* In the <<knn-search, kNN search API>>, to find the _k_ most similar vectors
+to a query vector
+
+The `dense_vector` type does not support aggregations or sorting.
 
 
 You add a `dense_vector` field as an array of floats:
 You add a `dense_vector` field as an array of floats:
 
 
 [source,console]
 [source,console]
 --------------------------------------------------
 --------------------------------------------------
-PUT my-index-000001
+PUT my-index
 {
 {
   "mappings": {
   "mappings": {
     "properties": {
     "properties": {
@@ -31,13 +34,13 @@ PUT my-index-000001
   }
   }
 }
 }
 
 
-PUT my-index-000001/_doc/1
+PUT my-index/_doc/1
 {
 {
   "my_text" : "text1",
   "my_text" : "text1",
   "my_vector" : [0.5, 10, 6]
   "my_vector" : [0.5, 10, 6]
 }
 }
 
 
-PUT my-index-000001/_doc/2
+PUT my-index/_doc/2
 {
 {
   "my_text" : "text2",
   "my_text" : "text2",
   "my_vector" : [-0.5, 10, 10]
   "my_vector" : [-0.5, 10, 10]
@@ -63,12 +66,13 @@ similarity.
 
 
 In many cases, a brute-force kNN search is not efficient enough. For this
 In many cases, a brute-force kNN search is not efficient enough. For this
 reason, the `dense_vector` type supports indexing vectors into a specialized
 reason, the `dense_vector` type supports indexing vectors into a specialized
-data structure to support fast kNN search. You can enable indexing through the
-`index` parameter:
+data structure to support fast kNN retrieval through the
+<<knn-search, kNN search API>>. You can enable indexing by setting the `index`
+parameter:
 
 
 [source,console]
 [source,console]
 --------------------------------------------------
 --------------------------------------------------
-PUT my-index-000002
+PUT my-index-2
 {
 {
   "mappings": {
   "mappings": {
     "properties": {
     "properties": {
@@ -84,9 +88,9 @@ PUT my-index-000002
 --------------------------------------------------
 --------------------------------------------------
 <1> When `index` is enabled, you must define the vector similarity to use in kNN search
 <1> When `index` is enabled, you must define the vector similarity to use in kNN search
 
 
-{es} uses the https://arxiv.org/abs/1603.09320[HNSW algorithm] to
-support efficient kNN search. Like most kNN algorithms, HNSW is an approximate
-method that sacrifices result accuracy for improved speed.
+{es} uses the https://arxiv.org/abs/1603.09320[HNSW algorithm] to support
+efficient kNN search. Like most kNN algorithms, HNSW is an approximate method
+that sacrifices result accuracy for improved speed.
 
 
 NOTE: Indexing vectors for approximate kNN search is an expensive process. It can take
 NOTE: Indexing vectors for approximate kNN search is an expensive process. It can take
 substantial time to ingest documents that contain vector fields with `index`
 substantial time to ingest documents that contain vector fields with `index`
@@ -107,9 +111,10 @@ Number of vector dimensions. Can't exceed `2048`.
 
 
 `index`::
 `index`::
 (Optional, Boolean)
 (Optional, Boolean)
-If `true`, you can search this field using the kNN search API. Defaults to
-`false`.
+If `true`, you can search this field using the <<knn-search, kNN search API>>.
+Defaults to `false`.
 
 
+[[dense-vector-similarity]]
 `similarity`::
 `similarity`::
 (Required^*^, string)
 (Required^*^, string)
 The vector similarity metric to use in kNN search. Documents are ranked by
 The vector similarity metric to use in kNN search. Documents are ranked by

+ 5 - 2
docs/reference/search.asciidoc

@@ -15,10 +15,11 @@ exception of the <<search-explain,explain API>>.
 * <<search-multi-search>>
 * <<search-multi-search>>
 * <<async-search>>
 * <<async-search>>
 * <<point-in-time-api>>
 * <<point-in-time-api>>
-* <<scroll-api>>
-* <<clear-scroll-api>>
+* <<knn-search>>
 * <<search-suggesters>>
 * <<search-suggesters>>
 * <<search-terms-enum>>
 * <<search-terms-enum>>
+* <<scroll-api>>
+* <<clear-scroll-api>>
 
 
 [discrete]
 [discrete]
 [[search-testing-apis]]
 [[search-testing-apis]]
@@ -50,6 +51,8 @@ include::search/async-search.asciidoc[]
 
 
 include::search/point-in-time-api.asciidoc[]
 include::search/point-in-time-api.asciidoc[]
 
 
+include::search/knn-search.asciidoc[]
+
 include::search/scroll-api.asciidoc[]
 include::search/scroll-api.asciidoc[]
 
 
 include::search/clear-scroll-api.asciidoc[]
 include::search/clear-scroll-api.asciidoc[]

+ 141 - 0
docs/reference/search/knn-search.asciidoc

@@ -0,0 +1,141 @@
+[[knn-search]]
+=== kNN search API
+++++
+<titleabbrev>kNN search</titleabbrev>
+++++
+
+experimental::[]
+
+Performs a k-nearest neighbor (kNN) search and returns the matching documents.
+
+////
+[source,console]
+----
+PUT my-index
+{
+  "mappings": {
+    "properties": {
+      "image_vector": {
+        "type": "dense_vector",
+        "dims": 3,
+        "index": true,
+        "similarity": "l2_norm"
+      }
+    }
+  }
+}
+
+PUT my-index/_doc/1?refresh
+{
+  "image_vector" : [0.5, 10, 6]
+}
+----
+////
+
+[source,console]
+----
+GET my-index/_knn_search
+{
+  "knn": {
+    "field": "image_vector",
+    "query_vector": [0.3, 0.1, 1.2],
+    "k": 10,
+    "num_candidates": 100
+  },
+  "_source": ["name", "date"]
+}
+----
+// TEST[continued]
+
+[[knn-search-api-request]]
+==== {api-request-title}
+
+`GET <target>/_knn_search`
+
+`POST <target>/_knn_search`
+
+[[knn-search-api-prereqs]]
+==== {api-prereq-title}
+
+* If the {es} {security-features} are enabled, you must have the `read`
+<<privileges-list-indices,index privilege>> for the target data stream, index,
+or alias.
+
+[[knn-search-api-desc]]
+==== {api-description-title}
+
+The kNN search API performs a k-nearest neighbor (kNN) search on a
+<<dense-vector,`dense_vector`>> field. Given a query vector, it finds the _k_
+closest vectors and returns those documents as search hits.
+
+{es} uses the https://arxiv.org/abs/1603.09320[HNSW algorithm] to support
+efficient kNN search. Like most kNN algorithms, HNSW is an approximate method
+that sacrifices result accuracy for improved search speed. This means the
+results returned are not always the true _k_ closest neighbors.
+
+[[knn-search-api-path-params]]
+==== {api-path-parms-title}
+
+`<target>`::
+(Optional, string) Comma-separated list of data streams, indices, and aliases
+to search. Supports wildcards (`*`). To search all data streams and indices,
+use `*` or `_all`.
+
+WARNING: kNN search does not yet work with <<filter-alias,filtered aliases>>.
+Running a kNN search against a filtered alias may incorrectly result in fewer
+than _k_ hits.
+
+[role="child_attributes"]
+[[knn-search-api-query-params]]
+==== {api-query-parms-title}
+
+include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
+
+[role="child_attributes"]
+[[knn-search-api-request-body]]
+==== {api-request-body-title}
+
+`knn`::
+(Required, object) Defines the kNN query to run.
++
+.Properties of `knn` object
+[%collapsible%open]
+====
+`field`::
+(Required, string) The name of the vector field to search against. Must be a
+<<index-vectors-knn-search, `dense_vector` field with indexing enabled>>.
+
+`query_vector`::
+(Required, array of floats) Query vector. Must have the same number of
+dimensions as the vector field you are searching against.
+
+`k`::
+(Required, integer) Number of nearest neighbors to return as top hits. This
+value must be less than `num_candidates`.
+
+`num_candidates`::
+(Required, integer) The number of nearest neighbor candidates to consider per
+shard. Cannot exceed 10,000. {es} collects `num_candidates` results from each
+shard, then merges them to find the top `k` results. Increasing
+`num_candidates` tends to improve the accuracy of the final `k` results.
+====
+
+include::{es-repo-dir}/search/search.asciidoc[tag=docvalue-fields-def]
+include::{es-repo-dir}/search/search.asciidoc[tag=fields-param-def]
+include::{es-repo-dir}/search/search.asciidoc[tag=source-filtering-def]
+include::{es-repo-dir}/search/search.asciidoc[tag=stored-fields-def]
+
+[role="child_attributes"]
+[[knn-search-api-response-body]]
+==== {api-response-body-title}
+
+A kNN search response has the exact same structure as a
+<<search-api-response-body, search API response>>. However, certain sections
+have a meaning specific to kNN search:
+
+* The <<search-api-response-body-score,document `_score`>> is determined by
+the similarity between the query and document vector. See
+<<dense-vector-similarity, `similarity`>>.
+* The `hits.total` object contains the total number of nearest neighbor
+candidates considered, which is `num_candidates * num_shards`. The
+`hits.total.relation` will always be `eq`, indicating an exact value.

+ 1 - 1
docs/reference/search/search-vector-tile-api.asciidoc

@@ -290,7 +290,7 @@ You can specify fields in the array as a string or object.
 .Properties of `fields` objects
 .Properties of `fields` objects
 [%collapsible%open]
 [%collapsible%open]
 ====
 ====
-include::search.asciidoc[tag=fields-api-props]
+include::search.asciidoc[tag=fields-param-props]
 ====
 ====
 
 
 include::search-vector-tile-api.asciidoc[tag=grid-precision]
 include::search-vector-tile-api.asciidoc[tag=grid-precision]

+ 38 - 17
docs/reference/search/search.asciidoc

@@ -92,7 +92,7 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=df]
 
 
 `docvalue_fields`::
 `docvalue_fields`::
 (Optional, string) A comma-separated list of fields to return as the docvalue
 (Optional, string) A comma-separated list of fields to return as the docvalue
-representation of a field for each hit.
+representation of a field for each hit. See <<docvalue-fields>>.
 
 
 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=expand-wildcards]
 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=expand-wildcards]
 +
 +
@@ -241,11 +241,12 @@ By default, you cannot page through more than 10,000 hits using the `from` and
 `sort`::
 `sort`::
 (Optional, string) A comma-separated list of <field>:<direction> pairs.
 (Optional, string) A comma-separated list of <field>:<direction> pairs.
 
 
+[[search-source-param]]
 `_source`::
 `_source`::
 (Optional)
 (Optional)
 Indicates which <<mapping-source-field,source fields>> are returned for matching
 Indicates which <<mapping-source-field,source fields>> are returned for matching
 documents. These fields are returned in the `hits._source` property of
 documents. These fields are returned in the `hits._source` property of
-the search response. Defaults to `true`.
+the search response. Defaults to `true`. See <<source-filtering>>.
 +
 +
 .Valid values for `_source`
 .Valid values for `_source`
 [%collapsible%open]
 [%collapsible%open]
@@ -275,7 +276,7 @@ purposes.
 `stored_fields`::
 `stored_fields`::
 (Optional, string) A comma-separated list of stored fields to return as part
 (Optional, string) A comma-separated list of stored fields to return as part
 of a hit. If no fields are specified, no stored fields are included in the
 of a hit. If no fields are specified, no stored fields are included in the
-response.
+response. See <<stored-fields>>.
 +
 +
 If this field is specified, the `_source` parameter defaults to `false`. You can
 If this field is specified, the `_source` parameter defaults to `false`. You can
 pass `_source: true` to return both source fields and
 pass `_source: true` to return both source fields and
@@ -344,13 +345,14 @@ If `true`, returns document version as part of a hit. Defaults to `false`.
 ==== {api-request-body-title}
 ==== {api-request-body-title}
 
 
 [[search-docvalue-fields-param]]
 [[search-docvalue-fields-param]]
+// tag::docvalue-fields-def[]
 `docvalue_fields`::
 `docvalue_fields`::
 (Optional, array of strings and objects)
 (Optional, array of strings and objects)
-Array of wildcard (`*`) patterns. The request returns doc values for field names
-matching these patterns in the `hits.fields` property of the response.
+Array of field patterns. The request returns values for field names matching
+these patterns in the `hits.fields` property of the response.
 +
 +
-You can specify items in the array as a string or object.
-See <<docvalue-fields>>.
+You can specify items in the array as a string or object. See
+<<docvalue-fields>>.
 +
 +
 .Properties of `docvalue_fields` objects
 .Properties of `docvalue_fields` objects
 [%collapsible%open]
 [%collapsible%open]
@@ -371,19 +373,22 @@ pattern].
 +
 +
 For other field data types, this parameter is not supported.
 For other field data types, this parameter is not supported.
 ====
 ====
+// end::docvalue-fields-def[]
 
 
 [[search-api-fields]]
 [[search-api-fields]]
+// tag::fields-param-def[]
 `fields`::
 `fields`::
 (Optional, array of strings and objects)
 (Optional, array of strings and objects)
-Array of wildcard (`*`) patterns. The request returns values for field names
-matching these patterns in the `hits.fields` property of the response.
+Array of field patterns. The request returns values for field names matching
+these patterns in the `hits.fields` property of the response.
 +
 +
-You can specify items in the array as a string or object.
+You can specify items in the array as a string or object. See
+<<search-fields-param>>.
 +
 +
 .Properties of `fields` objects
 .Properties of `fields` objects
 [%collapsible%open]
 [%collapsible%open]
 ====
 ====
-// tag::fields-api-props[]
+// tag::fields-param-props[]
 `field`::
 `field`::
 (Required, string) Field to return. Supports wildcards (`*`).
 (Required, string) Field to return. Supports wildcards (`*`).
 
 
@@ -425,8 +430,21 @@ returns the tile as a base64-encoded string.
 square with equal sides. Defaults to `4096`.
 square with equal sides. Defaults to `4096`.
 ========
 ========
 --
 --
-// end::fields-api-props[]
+// end::fields-param-props[]
 ====
 ====
+// end::fields-param-def[]
+
+[[search-stored-fields-param]]
+// tag::stored-fields-def[]
+`stored_fields`::
+(Optional, string) A comma-separated list of stored fields to return as part
+of a hit. If no fields are specified, no stored fields are included in the
+response. See <<stored-fields>>.
++
+If this option is specified, the `_source` parameter defaults to `false`. You
+can pass `_source: true` to return both source fields and stored fields in the
+search response.
+// end::stored-fields-def[]
 
 
 [[request-body-search-explain]]
 [[request-body-search-explain]]
 `explain`::
 `explain`::
@@ -579,11 +597,12 @@ By default, you cannot page through more than 10,000 hits using the `from` and
 `size` parameters. To page through more hits, use the
 `size` parameters. To page through more hits, use the
 <<search-after,`search_after`>> parameter.
 <<search-after,`search_after`>> parameter.
 
 
+// tag::source-filtering-def[]
 `_source`::
 `_source`::
 (Optional)
 (Optional)
 Indicates which <<mapping-source-field,source fields>> are returned for matching
 Indicates which <<mapping-source-field,source fields>> are returned for matching
 documents. These fields are returned in the `hits._source` property of
 documents. These fields are returned in the `hits._source` property of
-the search response. Defaults to `true`.
+the search response. Defaults to `true`. See <<source-filtering>>.
 +
 +
 .Valid values for `_source`
 .Valid values for `_source`
 [%collapsible%open]
 [%collapsible%open]
@@ -623,6 +642,7 @@ If this property is specified, only these source fields are returned. You can
 exclude fields from this subset using the `excludes` property.
 exclude fields from this subset using the `excludes` property.
 =====
 =====
 ====
 ====
+// end::source-filtering-def[]
 
 
 [[stats-groups]]
 [[stats-groups]]
 `stats`::
 `stats`::
@@ -733,25 +753,25 @@ Contains returned documents and metadata.
 ====
 ====
 `total`::
 `total`::
 (object)
 (object)
-Metadata about the number of returned documents.
+Metadata about the number of matching documents.
 +
 +
 .Properties of `total`
 .Properties of `total`
 [%collapsible%open]
 [%collapsible%open]
 =====
 =====
 `value`::
 `value`::
 (integer)
 (integer)
-Total number of returned documents.
+Total number of matching documents.
 
 
 `relation`::
 `relation`::
 (string)
 (string)
-Indicates whether the number of returned documents in the `value`
+Indicates whether the number of matching documents in the `value`
 parameter is accurate or a lower bound.
 parameter is accurate or a lower bound.
 +
 +
 .Values of `relation`:
 .Values of `relation`:
 [%collapsible%open]
 [%collapsible%open]
 ======
 ======
 `eq`:: Accurate
 `eq`:: Accurate
-`gte`:: Lower bound, including returned documents
+`gte`:: Lower bound
 ======
 ======
 =====
 =====
 
 
@@ -799,6 +819,7 @@ or specify which source fields to return.
 Contains field values for the documents. These fields must be specified in the
 Contains field values for the documents. These fields must be specified in the
 request using one or more of the following request parameters:
 request using one or more of the following request parameters:
 
 
+* <<search-fields-param,`fields`>>
 * <<search-docvalue-fields-param,`docvalue_fields`>>
 * <<search-docvalue-fields-param,`docvalue_fields`>>
 * <<script-fields,`script_fields`>>
 * <<script-fields,`script_fields`>>
 * <<stored-fields,`stored_fields`>>
 * <<stored-fields,`stored_fields`>>