123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277 |
- [[query-dsl-knn-query]]
- === Knn query
- ++++
- <titleabbrev>Knn</titleabbrev>
- ++++
- Finds the _k_ nearest vectors to a query vector, as measured by a similarity
- metric. _knn_ query finds nearest vectors through approximate search on indexed
- dense_vectors. The preferred way to do approximate kNN search is through the
- <<knn-search,top level knn section>> of a search request. _knn_ query is reserved for
- expert cases, where there is a need to combine this query with other queries.
- [[knn-query-ex-request]]
- ==== Example request
- [source,console]
- ----
- PUT my-image-index
- {
- "mappings": {
- "properties": {
- "image-vector": {
- "type": "dense_vector",
- "dims": 3,
- "index": true,
- "similarity": "l2_norm"
- },
- "file-type": {
- "type": "keyword"
- },
- "title": {
- "type": "text"
- }
- }
- }
- }
- ----
- . Index your data.
- +
- [source,console]
- ----
- POST my-image-index/_bulk?refresh=true
- { "index": { "_id": "1" } }
- { "image-vector": [1, 5, -20], "file-type": "jpg", "title": "mountain lake" }
- { "index": { "_id": "2" } }
- { "image-vector": [42, 8, -15], "file-type": "png", "title": "frozen lake"}
- { "index": { "_id": "3" } }
- { "image-vector": [15, 11, 23], "file-type": "jpg", "title": "mountain lake lodge" }
- ----
- //TEST[continued]
- . Run the search using the `knn` query, asking for the top 10 nearest vectors
- from each shard, and then combine shard results to get the top 3 global results.
- +
- [source,console]
- ----
- POST my-image-index/_search
- {
- "size" : 3,
- "query" : {
- "knn": {
- "field": "image-vector",
- "query_vector": [-5, 9, -12],
- "k": 10
- }
- }
- }
- ----
- //TEST[continued]
- [[knn-query-top-level-parameters]]
- ==== Top-level parameters for `knn`
- `field`::
- +
- --
- (Required, string) The name of the vector field to search against. Must be a
- <<index-vectors-knn-search, `dense_vector` field with indexing enabled>>.
- --
- `query_vector`::
- +
- --
- (Optional, array of floats or string) Query vector. Must have the same number of dimensions
- as the vector field you are searching against. Must be either an array of floats or a hex-encoded byte vector.
- Either this or `query_vector_builder` must be provided.
- --
- `query_vector_builder`::
- +
- --
- (Optional, object) Query vector builder.
- include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-query-vector-builder]
- --
- `k`::
- +
- --
- (Optional, integer) The number of nearest neighbors to return from each shard.
- {es} collects `k` results from each shard, then merges them to find the global top results.
- This value must be less than or equal to `num_candidates`. Defaults to `num_candidates`.
- --
- `num_candidates`::
- +
- --
- (Optional, integer) The number of nearest neighbor candidates to consider per shard
- while doing knn search. Cannot exceed 10,000. Increasing `num_candidates` tends to
- improve the accuracy of the final results.
- Defaults to `1.5 * k` if `k` is set, or `1.5 * size` if `k` is not set.
- --
- `filter`::
- +
- --
- (Optional, query object) Query to filter the documents that can match.
- The kNN search will return the top documents that also match this filter.
- The value can be a single query or a list of queries. If `filter` is not provided,
- all documents are allowed to match.
- The filter is a pre-filter, meaning that it is applied **during** the approximate
- kNN search to ensure that `num_candidates` matching documents are returned.
- --
- `similarity`::
- +
- --
- (Optional, float) The minimum similarity required for a document to be considered
- a match. The similarity value calculated relates to the raw
- <<dense-vector-similarity, `similarity`>> used. Not the document score. The matched
- documents are then scored according to <<dense-vector-similarity, `similarity`>>
- and the provided `boost` is applied.
- --
- `boost`::
- +
- --
- (Optional, float) Floating point number used to multiply the
- scores of matched documents. This value cannot be negative. Defaults to `1.0`.
- --
- `_name`::
- +
- --
- (Optional, string) Name field to identify the query
- --
- [[knn-query-filtering]]
- ==== Pre-filters and post-filters in knn query
- There are two ways to filter documents that match a kNN query:
- . **pre-filtering** – filter is applied during the approximate kNN search
- to ensure that `k` matching documents are returned.
- . **post-filtering** – filter is applied after the approximate kNN search
- completes, which results in fewer than k results, even when there are enough
- matching documents.
- Pre-filtering is supported through the `filter` parameter of the `knn` query.
- Also filters from <<filter-alias,aliases>> are applied as pre-filters.
- All other filters found in the Query DSL tree are applied as post-filters.
- For example, `knn` query finds the top 3 documents with the nearest vectors
- (k=3), which are combined with `term` filter, that is
- post-filtered. The final set of documents will contain only a single document
- that passes the post-filter.
- [source,console]
- ----
- POST my-image-index/_search
- {
- "size" : 10,
- "query" : {
- "bool" : {
- "must" : {
- "knn": {
- "field": "image-vector",
- "query_vector": [-5, 9, -12],
- "k": 3
- }
- },
- "filter" : {
- "term" : { "file-type" : "png" }
- }
- }
- }
- }
- ----
- //TEST[continued]
- [[knn-query-in-hybrid-search]]
- ==== Hybrid search with knn query
- Knn query can be used as a part of hybrid search, where knn query is combined
- with other lexical queries. For example, the query below finds documents with
- `title` matching `mountain lake`, and combines them with the top 10 documents
- that have the closest image vectors to the `query_vector`. The combined documents
- are then scored and the top 3 top scored documents are returned.
- +
- [source,console]
- ----
- POST my-image-index/_search
- {
- "size" : 3,
- "query": {
- "bool": {
- "should": [
- {
- "match": {
- "title": {
- "query": "mountain lake",
- "boost": 1
- }
- }
- },
- {
- "knn": {
- "field": "image-vector",
- "query_vector": [-5, 9, -12],
- "k": 10,
- "boost": 2
- }
- }
- ]
- }
- }
- }
- ----
- //TEST[continued]
- [[knn-query-with-nested-query]]
- ==== Knn query inside a nested query
- `knn` query can be used inside a nested query. The behaviour here is similar
- to <<nested-knn-search, top level nested kNN search>>:
- * kNN search over nested dense_vectors diversifies the top results over
- the top-level document
- * `filter` over the top-level document metadata is supported and acts as a
- post-filter
- * `filter` over `nested` field metadata is not supported
- A sample query can look like below:
- [source,js]
- ----
- {
- "query" : {
- "nested" : {
- "path" : "paragraph",
- "query" : {
- "knn": {
- "query_vector": [
- 0.45,
- 45
- ],
- "field": "paragraph.vector",
- "num_candidates": 2
- }
- }
- }
- }
- }
- ----
- // NOTCONSOLE
- [[knn-query-aggregations]]
- ==== Knn query with aggregations
- `knn` query calculates aggregations on top `k` documents from each shard.
- Thus, the final results from aggregations contain
- `k * number_of_shards` documents. This is different from
- the <<knn-search,top level knn section>> where aggregations are
- calculated on the global top `k` nearest documents.
|