| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280 | [[query-dsl-knn-query]]=== Knn query++++<titleabbrev>Knn</titleabbrev>++++Finds the _k_ nearest vectors to a query vector, as measured by a similaritymetric. _knn_ query finds nearest vectors through approximate search on indexeddense_vectors. The preferred way to do approximate kNN search is through the<<knn-search,top level knn section>> of a search request. _knn_ query is reserved forexpert cases, where there is a need to combine this query with other queries, orperform a kNN search against a <<semantic-text, semantic_text>> field.[[knn-query-ex-request]]==== Example request[source,console]----PUT my-image-index{  "mappings": {    "properties": {       "image-vector": {        "type": "dense_vector",        "dims": 3,        "index": true,        "similarity": "l2_norm"      },      "file-type": {        "type": "keyword"      },      "title": {        "type": "text"      }    }  }}----. Index your data.+[source,console]----POST my-image-index/_bulk?refresh=true{ "index": { "_id": "1" } }{ "image-vector": [1, 5, -20], "file-type": "jpg", "title": "mountain lake" }{ "index": { "_id": "2" } }{ "image-vector": [42, 8, -15], "file-type": "png", "title": "frozen lake"}{ "index": { "_id": "3" } }{ "image-vector": [15, 11, 23], "file-type": "jpg", "title": "mountain lake lodge" }----//TEST[continued]. Run the search using the `knn` query, asking for the top 10 nearest vectorsfrom each shard, and then combine shard results to get the top 3 global results.+[source,console]----POST my-image-index/_search{  "size" : 3,  "query" : {    "knn": {      "field": "image-vector",      "query_vector": [-5, 9, -12],      "k": 10    }  }}----//TEST[continued][[knn-query-top-level-parameters]]==== Top-level parameters for `knn``field`::+--(Required, string) The name of the vector field to search against. Must be a<<index-vectors-knn-search, `dense_vector` field with indexing enabled>>, or a<<semantic-text, `semantic_text` field>> with a compatible dense vector inference model.--`query_vector`::+--(Optional, array of floats or string) Query vector. Must have the same number of dimensionsas the vector field you are searching against. Must be either an array of floats or a hex-encoded byte vector.Either this or `query_vector_builder` must be provided.--`query_vector_builder`::+--(Optional, object) Query vector builder.include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-query-vector-builder]If all queried fields are of type <<semantic-text, semantic_text>>, the inference ID associated with the `semantic_text` field may be inferred.--`k`::+--(Optional, integer) The number of nearest neighbors to return from each shard.{es} collects `k` results from each shard, then merges them to find the global top results.This value must be less than or equal to `num_candidates`. Defaults to search request size.--`num_candidates`::+--(Optional, integer) The number of nearest neighbor candidates to consider per shardwhile doing knn search. Cannot exceed 10,000. Increasing `num_candidates` tends toimprove the accuracy of the final results.Defaults to `1.5 * k` if `k` is set, or `1.5 * size` if `k` is not set.--`filter`::+--(Optional, query object) Query to filter the documents that can match.The kNN search will return the top documents that also match this filter.The value can be a single query or a list of queries. If `filter` is not provided,all documents are allowed to match.The filter is a pre-filter, meaning that it is applied **during** the approximatekNN search to ensure that `num_candidates` matching documents are returned.--`similarity`::+--(Optional, float) The minimum similarity required for a document to be considereda match. The similarity value calculated relates to the raw<<dense-vector-similarity, `similarity`>> used. Not the document score. The matcheddocuments are then scored according to <<dense-vector-similarity, `similarity`>>and the provided `boost` is applied.--`boost`::+--(Optional, float) Floating point number used to multiply thescores of matched documents. This value cannot be negative. Defaults to `1.0`.--`_name`::+--(Optional, string) Name field to identify the query--[[knn-query-filtering]]==== Pre-filters and post-filters in knn queryThere are two ways to filter documents that match a kNN query:. **pre-filtering** – filter is applied during the approximate kNN searchto ensure that `k` matching documents are returned.. **post-filtering** – filter is applied after the approximate kNN searchcompletes, which results in fewer than k results, even when there are enoughmatching documents.Pre-filtering is supported through the `filter` parameter of the `knn` query.Also filters from <<filter-alias,aliases>> are applied as pre-filters.All other filters found in the Query DSL tree are applied as post-filters.For example, `knn` query finds the top 3 documents with the nearest vectors(k=3), which are combined with  `term` filter, that ispost-filtered. The final set of documents will contain only a single documentthat passes the post-filter.[source,console]----POST my-image-index/_search{  "size" : 10,  "query" : {    "bool" : {      "must" : {        "knn": {          "field": "image-vector",          "query_vector": [-5, 9, -12],          "k": 3        }      },      "filter" : {        "term" : { "file-type" : "png" }      }    }  }}----//TEST[continued][[knn-query-in-hybrid-search]]==== Hybrid search with knn queryKnn query can be used as a part of hybrid search, where knn query is combinedwith other lexical queries. For example, the query below finds documents with`title` matching `mountain lake`, and combines them with the top 10 documentsthat have the closest image vectors to the `query_vector`. The combined documentsare then scored and the top 3 top scored documents are returned.+[source,console]----POST my-image-index/_search{  "size" : 3,  "query": {    "bool": {      "should": [        {          "match": {            "title": {              "query": "mountain lake",              "boost": 1            }          }        },        {          "knn": {            "field": "image-vector",            "query_vector": [-5, 9, -12],            "k": 10,            "boost": 2          }        }      ]    }  }}----//TEST[continued][[knn-query-with-nested-query]]==== Knn query inside a nested query`knn` query can be used inside a nested query. The behaviour here is similarto <<nested-knn-search, top level nested kNN search>>:* kNN search over nested dense_vectors diversifies the top results overthe top-level document* `filter`  over the top-level document metadata is supported and acts as apre-filter* `filter` over `nested` field metadata is not supportedA sample query can look like below:[source,js]----{  "query" : {    "nested" : {      "path" : "paragraph",        "query" : {          "knn": {            "query_vector": [                0.45,                45            ],            "field": "paragraph.vector",            "num_candidates": 2        }      }    }  }}----// NOTCONSOLE[[knn-query-aggregations]]==== Knn query with aggregations`knn` query calculates aggregations on top `k` documents from each shard.Thus, the final results from aggregations contain`k * number_of_shards` documents. This is different fromthe <<knn-search,top level knn section>> where aggregations arecalculated on the global top `k` nearest documents.
 |