knn-search.asciidoc 7.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
  1. [[tune-knn-search]]
  2. == Tune approximate kNN search
  3. {es} supports <<approximate-knn, approximate k-nearest neighbor search>> for
  4. efficiently finding the _k_ nearest vectors to a query vector. Since
  5. approximate kNN search works differently from other queries, there are special
  6. considerations around its performance.
  7. Many of these recommendations help improve search speed. With approximate kNN,
  8. the indexing algorithm runs searches under the hood to create the vector index
  9. structures. So these same recommendations also help with indexing speed.
  10. [discrete]
  11. === Reduce vector memory foot-print
  12. The default <<dense-vector-element-type,`element_type`>> is `float`. But this
  13. can be automatically quantized during index time through
  14. <<dense-vector-quantization,`quantization`>>. Quantization will reduce the
  15. required memory by 4x, but it will also reduce the precision of the vectors and
  16. increase disk usage for the field (by up to 25%). Increased disk usage is a
  17. result of {es} storing both the quantized and the unquantized vectors.
  18. For example, when quantizing 40GB of floating point vectors an extra 10GB of data will be stored for the quantized vectors. The total disk usage amounts to 50GB, but the memory usage for fast search will be reduced to 10GB.
  19. For `float` vectors with `dim` greater than or equal to `384`, using a
  20. <<dense-vector-quantization,`quantized`>> index is highly recommended.
  21. [discrete]
  22. === Reduce vector dimensionality
  23. The speed of kNN search scales linearly with the number of vector dimensions,
  24. because each similarity computation considers each element in the two vectors.
  25. Whenever possible, it's better to use vectors with a lower dimension. Some
  26. embedding models come in different "sizes", with both lower and higher
  27. dimensional options available. You could also experiment with dimensionality
  28. reduction techniques like PCA. When experimenting with different approaches,
  29. it's important to measure the impact on relevance to ensure the search
  30. quality is still acceptable.
  31. [discrete]
  32. === Exclude vector fields from `_source`
  33. {es} stores the original JSON document that was passed at index time in the
  34. <<mapping-source-field, `_source` field>>. By default, each hit in the search
  35. results contains the full document `_source`. When the documents contain
  36. high-dimensional `dense_vector` fields, the `_source` can be quite large and
  37. expensive to load. This could significantly slow down the speed of kNN search.
  38. You can disable storing `dense_vector` fields in the `_source` through the
  39. <<include-exclude, `excludes`>> mapping parameter. This prevents loading and
  40. returning large vectors during search, and also cuts down on the index size.
  41. Vectors that have been omitted from `_source` can still be used in kNN search,
  42. since it relies on separate data structures to perform the search. Before
  43. using the <<include-exclude, `excludes`>> parameter, make sure to review the
  44. downsides of omitting fields from `_source`.
  45. Another option is to use <<synthetic-source,synthetic `_source`>> if all
  46. your index fields support it.
  47. [discrete]
  48. === Ensure data nodes have enough memory
  49. {es} uses the https://arxiv.org/abs/1603.09320[HNSW] algorithm for approximate
  50. kNN search. HNSW is a graph-based algorithm which only works efficiently when
  51. most vector data is held in memory. You should ensure that data nodes have at
  52. least enough RAM to hold the vector data and index structures. To check the
  53. size of the vector data, you can use the <<indices-disk-usage>> API. As a
  54. loose rule of thumb, and assuming the default HNSW options, the bytes used will
  55. be `num_vectors * 4 * (num_dimensions + 12)`. When using the `byte` <<dense-vector-element-type,`element_type`>>
  56. the space required will be closer to `num_vectors * (num_dimensions + 12)`. Note that
  57. the required RAM is for the filesystem cache, which is separate from the Java
  58. heap.
  59. The data nodes should also leave a buffer for other ways that RAM is needed.
  60. For example your index might also include text fields and numerics, which also
  61. benefit from using filesystem cache. It's recommended to run benchmarks with
  62. your specific dataset to ensure there's a sufficient amount of memory to give
  63. good search performance.
  64. You can find https://elasticsearch-benchmarks.elastic.co/#tracks/so_vector[here]
  65. and https://elasticsearch-benchmarks.elastic.co/#tracks/dense_vector[here] some examples
  66. of datasets and configurations that we use for our nightly benchmarks.
  67. [discrete]
  68. include::search-speed.asciidoc[tag=warm-fs-cache]
  69. The following file extensions are used for the approximate kNN search:
  70. * `vec` and `veq` for vector values
  71. * `vex` for HNSW graph
  72. * `vem`, `vemf`, and `vemq` for metadata
  73. [discrete]
  74. === Reduce the number of index segments
  75. {es} shards are composed of segments, which are internal storage elements in
  76. the index. For approximate kNN search, {es} stores the vector values of
  77. each segment as a separate HNSW graph, so kNN search must check each segment.
  78. The recent parallelization of kNN search made it much faster to search across
  79. multiple segments, but still kNN search can be up to several times
  80. faster if there are fewer segments. By default, {es} periodically
  81. merges smaller segments into larger ones through a background
  82. <<index-modules-merge, merge process>>. If this isn't sufficient, you can take
  83. explicit steps to reduce the number of index segments.
  84. [discrete]
  85. ==== Force merge to one segment
  86. The <<indices-forcemerge,force merge>> operation forces an index merge. If you
  87. force merge to one segment, the kNN search only need to check a single,
  88. all-inclusive HNSW graph. Force merging `dense_vector` fields is an expensive
  89. operation that can take significant time to complete.
  90. include::{es-repo-dir}/indices/forcemerge.asciidoc[tag=force-merge-read-only-warn]
  91. [discrete]
  92. ==== Create large segments during bulk indexing
  93. A common pattern is to first perform an initial bulk upload, then make an
  94. index available for searches. Instead of force merging, you can adjust the
  95. index settings to encourage {es} to create larger initial segments:
  96. * Ensure there are no searches during the bulk upload and disable
  97. <<index-refresh-interval-setting,`index.refresh_interval`>> by setting it to
  98. `-1`. This prevents refresh operations and avoids creating extra segments.
  99. * Give {es} a large indexing buffer so it can accept more documents before
  100. flushing. By default, the <<indexing-buffer,`indices.memory.index_buffer_size`>>
  101. is set to 10% of the heap size. With a substantial heap size like 32GB, this
  102. is often enough. To allow the full indexing buffer to be used, you should also
  103. increase the limit <<index-modules-translog,`index.translog.flush_threshold_size`>>.
  104. [discrete]
  105. === Avoid heavy indexing during searches
  106. Actively indexing documents can have a negative impact on approximate kNN
  107. search performance, since indexing threads steal compute resources from
  108. search. When indexing and searching at the same time, {es} also refreshes
  109. frequently, which creates several small segments. This also hurts search
  110. performance, since approximate kNN search is slower when there are more
  111. segments.
  112. When possible, it's best to avoid heavy indexing during approximate kNN
  113. search. If you need to reindex all the data, perhaps because the vector
  114. embedding model changed, then it's better to reindex the new documents into a
  115. separate index rather than update them in-place. This helps avoid the slowdown
  116. mentioned above, and prevents expensive merge operations due to frequent
  117. document updates.
  118. [discrete]
  119. include::search-speed.asciidoc[tag=readahead]