knn-search.asciidoc 6.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134
  1. [[tune-knn-search]]
  2. == Tune approximate kNN search
  3. {es} supports <<approximate-knn, approximate k-nearest neighbor search>> for
  4. efficiently finding the _k_ nearest vectors to a query vector. Since
  5. approximate kNN search works differently from other queries, there are special
  6. considerations around its performance.
  7. Many of these recommendations help improve search speed. With approximate kNN,
  8. the indexing algorithm runs searches under the hood to create the vector index
  9. structures. So these same recommendations also help with indexing speed.
  10. [discrete]
  11. === Prefer `dot_product` over `cosine`
  12. When indexing vectors for approximate kNN search, you need to specify the
  13. <<dense-vector-similarity, `similarity` function>> for comparing the vectors.
  14. If you'd like to compare vectors through cosine similarity, there are two
  15. options.
  16. The `cosine` option accepts any float vector and computes the cosine
  17. similarity. While this is convenient for testing, it's not the most efficient
  18. approach. Instead, we recommend using the `dot_product` option to compute the
  19. similarity. To use `dot_product`, all vectors need to be normalized in advance
  20. to have length 1. The `dot_product` option is significantly faster, since it
  21. avoids performing extra vector length computations during the search.
  22. [discrete]
  23. === Ensure data nodes have enough memory
  24. {es} uses the https://arxiv.org/abs/1603.09320[HNSW] algorithm for approximate
  25. kNN search. HNSW is a graph-based algorithm which only works efficiently when
  26. most vector data is held in memory. You should ensure that data nodes have at
  27. least enough RAM to hold the vector data and index structures. To check the
  28. size of the vector data, you can use the <<indices-disk-usage>> API. As a
  29. loose rule of thumb, and assuming the default HNSW options, the bytes required
  30. is roughly `num_vectors * 4 * (num_dimensions + 32)`. Note that the required
  31. RAM is for the filesystem cache, which is separate from the Java heap.
  32. The data nodes should also leave a "buffer" for other ways that RAM is
  33. needed. For example your index might also include text fields and numerics,
  34. which also benefit from using filesystem cache. It's recommended to run
  35. benchmarks with your specific dataset to ensure there's a sufficient amount of
  36. memory to give good search performance.
  37. [discrete]
  38. include::search-speed.asciidoc[tag=warm-fs-cache]
  39. [discrete]
  40. === Reduce vector dimensionality
  41. The speed of kNN search scales linearly with the number of vector dimensions,
  42. because each similarity computation considers each element in the two vectors.
  43. Whenever possible, it's better to use vectors with a lower dimension. Some
  44. embedding models come in different "sizes", with both lower and higher
  45. dimensional options available. You could also experiment with dimensionality
  46. reduction techniques like PCA. When experimenting with different approaches,
  47. it's important to measure the impact on relevance to ensure the search
  48. quality is still acceptable.
  49. [discrete]
  50. === Exclude vector fields from `_source`
  51. {es} stores the original JSON document that was passed at index time in the
  52. <<mapping-source-field, `_source` field>>. By default, each hit in the search
  53. results contains the full document `_source`. When the documents contain
  54. high-dimensional `dense_vector` fields, the `_source` can be quite large and
  55. expensive to load. This could significantly slow down the speed of kNN search.
  56. You can disable storing `dense_vector` fields in the `_source` through the
  57. <<include-exclude, `excludes`>> mapping parameter. This prevents loading and
  58. returning large vectors during search, and also cuts down on the index size.
  59. Vectors that have been omitted from `_source` can still be used in kNN search,
  60. since it relies on separate data structures to perform the search. Before
  61. using the <<include-exclude, `excludes`>> parameter, make sure to review the
  62. downsides of omitting fields from `_source`.
  63. [discrete]
  64. === Reduce the number of index segments
  65. {es} shards are composed of segments, which are internal storage elements in
  66. the index. For approximate kNN search, {es} stores the dense vector values of
  67. each segment as an HNSW graph. kNN search must check each segment, searching
  68. through one HNSW graph after another. This means kNN search can be
  69. significantly faster if there are fewer segments. By default, {es} periodically
  70. merges smaller segments into larger ones through a background
  71. <<index-modules-merge, merge process>>. If this isn't sufficient, you can take
  72. explicit steps to reduce the number of index segments.
  73. [discrete]
  74. ==== Force merge to one segment
  75. The <<indices-forcemerge,force merge>> operation forces an index merge. If you
  76. force merge to one segment, the kNN search only need to check a single,
  77. all-inclusive HNSW graph. Force merging `dense_vector` fields is an expensive
  78. operation that can take significant time to complete.
  79. include::{es-repo-dir}/indices/forcemerge.asciidoc[tag=force-merge-read-only-warn]
  80. [discrete]
  81. ==== Create large segments during bulk indexing
  82. A common pattern is to first perform an initial bulk upload, then make an
  83. index available for searches. Instead of force merging, you can adjust the
  84. index settings to encourage {es} to create larger initial segments:
  85. * Ensure there are no searches during the bulk upload and disable
  86. <<index-refresh-interval-setting,`index.refresh_interval`>> by setting it to
  87. `-1`. This prevents refresh operations and avoids creating extra segments.
  88. * Give {es} a large indexing buffer so it can accept more documents before
  89. flushing. By default, the <<indexing-buffer,`indices.memory.index_buffer_size`>>
  90. is set to 10% of the heap size. With a substantial heap size like 32GB, this
  91. is often enough. To allow the full indexing buffer to be used, you should also
  92. increase the limit <<index-modules-translog,`index.translog.flush_threshold_size`>>.
  93. [discrete]
  94. === Avoid heavy indexing during searches
  95. Actively indexing documents can have a negative impact on approximate kNN
  96. search performance, since indexing threads steal compute resources from
  97. search. When indexing and searching at the same time, {es} also refreshes
  98. frequently, which creates several small segments. This also hurts search
  99. performance, since approximate kNN search is slower when there are more
  100. segments.
  101. When possible, it's best to avoid heavy indexing during approximate kNN
  102. search. If you need to reindex all the data, perhaps because the vector
  103. embedding model changed, then it's better to reindex the new documents into a
  104. separate index rather than update them in-place. This helps avoid the slowdown
  105. mentioned above, and prevents expensive merge operations due to frequent
  106. document updates.
  107. [discrete]
  108. include::search-speed.asciidoc[tag=readahead]