knn-query.asciidoc 7.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280
  1. [[query-dsl-knn-query]]
  2. === Knn query
  3. ++++
  4. <titleabbrev>Knn</titleabbrev>
  5. ++++
  6. Finds the _k_ nearest vectors to a query vector, as measured by a similarity
  7. metric. _knn_ query finds nearest vectors through approximate search on indexed
  8. dense_vectors. The preferred way to do approximate kNN search is through the
  9. <<knn-search,top level knn section>> of a search request. _knn_ query is reserved for
  10. expert cases, where there is a need to combine this query with other queries, or
  11. perform a kNN search against a <<semantic-text, semantic_text>> field.
  12. [[knn-query-ex-request]]
  13. ==== Example request
  14. [source,console]
  15. ----
  16. PUT my-image-index
  17. {
  18. "mappings": {
  19. "properties": {
  20. "image-vector": {
  21. "type": "dense_vector",
  22. "dims": 3,
  23. "index": true,
  24. "similarity": "l2_norm"
  25. },
  26. "file-type": {
  27. "type": "keyword"
  28. },
  29. "title": {
  30. "type": "text"
  31. }
  32. }
  33. }
  34. }
  35. ----
  36. . Index your data.
  37. +
  38. [source,console]
  39. ----
  40. POST my-image-index/_bulk?refresh=true
  41. { "index": { "_id": "1" } }
  42. { "image-vector": [1, 5, -20], "file-type": "jpg", "title": "mountain lake" }
  43. { "index": { "_id": "2" } }
  44. { "image-vector": [42, 8, -15], "file-type": "png", "title": "frozen lake"}
  45. { "index": { "_id": "3" } }
  46. { "image-vector": [15, 11, 23], "file-type": "jpg", "title": "mountain lake lodge" }
  47. ----
  48. //TEST[continued]
  49. . Run the search using the `knn` query, asking for the top 10 nearest vectors
  50. from each shard, and then combine shard results to get the top 3 global results.
  51. +
  52. [source,console]
  53. ----
  54. POST my-image-index/_search
  55. {
  56. "size" : 3,
  57. "query" : {
  58. "knn": {
  59. "field": "image-vector",
  60. "query_vector": [-5, 9, -12],
  61. "k": 10
  62. }
  63. }
  64. }
  65. ----
  66. //TEST[continued]
  67. [[knn-query-top-level-parameters]]
  68. ==== Top-level parameters for `knn`
  69. `field`::
  70. +
  71. --
  72. (Required, string) The name of the vector field to search against. Must be a
  73. <<index-vectors-knn-search, `dense_vector` field with indexing enabled>>, or a
  74. <<semantic-text, `semantic_text` field>> with a compatible dense vector inference model.
  75. --
  76. `query_vector`::
  77. +
  78. --
  79. (Optional, array of floats or string) Query vector. Must have the same number of dimensions
  80. as the vector field you are searching against. Must be either an array of floats or a hex-encoded byte vector.
  81. Either this or `query_vector_builder` must be provided.
  82. --
  83. `query_vector_builder`::
  84. +
  85. --
  86. (Optional, object) Query vector builder.
  87. include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-query-vector-builder]
  88. If all queried fields are of type <<semantic-text, semantic_text>>, the inference ID associated with the `semantic_text` field may be inferred.
  89. --
  90. `k`::
  91. +
  92. --
  93. (Optional, integer) The number of nearest neighbors to return from each shard.
  94. {es} collects `k` results from each shard, then merges them to find the global top results.
  95. This value must be less than or equal to `num_candidates`. Defaults to search request size.
  96. --
  97. `num_candidates`::
  98. +
  99. --
  100. (Optional, integer) The number of nearest neighbor candidates to consider per shard
  101. while doing knn search. Cannot exceed 10,000. Increasing `num_candidates` tends to
  102. improve the accuracy of the final results.
  103. Defaults to `1.5 * k` if `k` is set, or `1.5 * size` if `k` is not set.
  104. --
  105. `filter`::
  106. +
  107. --
  108. (Optional, query object) Query to filter the documents that can match.
  109. The kNN search will return the top documents that also match this filter.
  110. The value can be a single query or a list of queries. If `filter` is not provided,
  111. all documents are allowed to match.
  112. The filter is a pre-filter, meaning that it is applied **during** the approximate
  113. kNN search to ensure that `num_candidates` matching documents are returned.
  114. --
  115. `similarity`::
  116. +
  117. --
  118. (Optional, float) The minimum similarity required for a document to be considered
  119. a match. The similarity value calculated relates to the raw
  120. <<dense-vector-similarity, `similarity`>> used. Not the document score. The matched
  121. documents are then scored according to <<dense-vector-similarity, `similarity`>>
  122. and the provided `boost` is applied.
  123. --
  124. `boost`::
  125. +
  126. --
  127. (Optional, float) Floating point number used to multiply the
  128. scores of matched documents. This value cannot be negative. Defaults to `1.0`.
  129. --
  130. `_name`::
  131. +
  132. --
  133. (Optional, string) Name field to identify the query
  134. --
  135. [[knn-query-filtering]]
  136. ==== Pre-filters and post-filters in knn query
  137. There are two ways to filter documents that match a kNN query:
  138. . **pre-filtering** – filter is applied during the approximate kNN search
  139. to ensure that `k` matching documents are returned.
  140. . **post-filtering** – filter is applied after the approximate kNN search
  141. completes, which results in fewer than k results, even when there are enough
  142. matching documents.
  143. Pre-filtering is supported through the `filter` parameter of the `knn` query.
  144. Also filters from <<filter-alias,aliases>> are applied as pre-filters.
  145. All other filters found in the Query DSL tree are applied as post-filters.
  146. For example, `knn` query finds the top 3 documents with the nearest vectors
  147. (k=3), which are combined with `term` filter, that is
  148. post-filtered. The final set of documents will contain only a single document
  149. that passes the post-filter.
  150. [source,console]
  151. ----
  152. POST my-image-index/_search
  153. {
  154. "size" : 10,
  155. "query" : {
  156. "bool" : {
  157. "must" : {
  158. "knn": {
  159. "field": "image-vector",
  160. "query_vector": [-5, 9, -12],
  161. "k": 3
  162. }
  163. },
  164. "filter" : {
  165. "term" : { "file-type" : "png" }
  166. }
  167. }
  168. }
  169. }
  170. ----
  171. //TEST[continued]
  172. [[knn-query-in-hybrid-search]]
  173. ==== Hybrid search with knn query
  174. Knn query can be used as a part of hybrid search, where knn query is combined
  175. with other lexical queries. For example, the query below finds documents with
  176. `title` matching `mountain lake`, and combines them with the top 10 documents
  177. that have the closest image vectors to the `query_vector`. The combined documents
  178. are then scored and the top 3 top scored documents are returned.
  179. +
  180. [source,console]
  181. ----
  182. POST my-image-index/_search
  183. {
  184. "size" : 3,
  185. "query": {
  186. "bool": {
  187. "should": [
  188. {
  189. "match": {
  190. "title": {
  191. "query": "mountain lake",
  192. "boost": 1
  193. }
  194. }
  195. },
  196. {
  197. "knn": {
  198. "field": "image-vector",
  199. "query_vector": [-5, 9, -12],
  200. "k": 10,
  201. "boost": 2
  202. }
  203. }
  204. ]
  205. }
  206. }
  207. }
  208. ----
  209. //TEST[continued]
  210. [[knn-query-with-nested-query]]
  211. ==== Knn query inside a nested query
  212. `knn` query can be used inside a nested query. The behaviour here is similar
  213. to <<nested-knn-search, top level nested kNN search>>:
  214. * kNN search over nested dense_vectors diversifies the top results over
  215. the top-level document
  216. * `filter` over the top-level document metadata is supported and acts as a
  217. pre-filter
  218. * `filter` over `nested` field metadata is not supported
  219. A sample query can look like below:
  220. [source,js]
  221. ----
  222. {
  223. "query" : {
  224. "nested" : {
  225. "path" : "paragraph",
  226. "query" : {
  227. "knn": {
  228. "query_vector": [
  229. 0.45,
  230. 45
  231. ],
  232. "field": "paragraph.vector",
  233. "num_candidates": 2
  234. }
  235. }
  236. }
  237. }
  238. }
  239. ----
  240. // NOTCONSOLE
  241. [[knn-query-aggregations]]
  242. ==== Knn query with aggregations
  243. `knn` query calculates aggregations on top `k` documents from each shard.
  244. Thus, the final results from aggregations contain
  245. `k * number_of_shards` documents. This is different from
  246. the <<knn-search,top level knn section>> where aggregations are
  247. calculated on the global top `k` nearest documents.