|
@@ -6,14 +6,14 @@
|
|
|
<titleabbrev>Dense vector</titleabbrev>
|
|
|
++++
|
|
|
|
|
|
-A `dense_vector` field stores dense vectors of float values.
|
|
|
-The maximum number of dimensions that can be in a vector should
|
|
|
-not exceed 2048. A `dense_vector` field is a single-valued field.
|
|
|
+The `dense_vector` field type stores dense vectors of float values.
|
|
|
|
|
|
-`dense_vector` fields do not support querying, sorting or aggregating. They can
|
|
|
-only be accessed in scripts through the dedicated <<vector-functions,vector functions>>.
|
|
|
+You can use `dense_vector` fields in
|
|
|
+<<query-dsl-script-score-query,`script_score`>> queries to score documents.
|
|
|
+They can also be indexed to support efficient k-nearest neighbor search. Dense
|
|
|
+vector fields do not support aggregations, sorting, or other query types.
|
|
|
|
|
|
-You index a dense vector as an array of floats.
|
|
|
+You add a `dense_vector` field as an array of floats:
|
|
|
|
|
|
[source,console]
|
|
|
--------------------------------------------------
|
|
@@ -23,7 +23,7 @@ PUT my-index-000001
|
|
|
"properties": {
|
|
|
"my_vector": {
|
|
|
"type": "dense_vector",
|
|
|
- "dims": 3 <1>
|
|
|
+ "dims": 3
|
|
|
},
|
|
|
"my_text" : {
|
|
|
"type" : "keyword"
|
|
@@ -46,4 +46,125 @@ PUT my-index-000001/_doc/2
|
|
|
|
|
|
--------------------------------------------------
|
|
|
|
|
|
-<1> dims – the number of dimensions in the vector, required parameter.
|
|
|
+NOTE: Unlike most other data types, dense vectors are always single-valued.
|
|
|
+It is not possible to store multiple values in one `dense_vector` field.
|
|
|
+
|
|
|
+[[index-vectors-knn-search]]
|
|
|
+==== Index vectors for kNN search
|
|
|
+
|
|
|
+experimental::[]
|
|
|
+
|
|
|
+A _k-nearest neighbor_ (kNN) search finds the _k_ nearest
|
|
|
+vectors to a query vector, as measured by a similarity metric.
|
|
|
+
|
|
|
+Dense vector fields can be used to rank documents in
|
|
|
+<<query-dsl-script-score-query,`script_score` queries>>. This lets you perform
|
|
|
+a brute-force kNN search by scanning all documents and ranking them by
|
|
|
+similarity.
|
|
|
+
|
|
|
+In many cases, a brute-force kNN search is not efficient enough. For this
|
|
|
+reason, the `dense_vector` type supports indexing vectors into a specialized
|
|
|
+data structure to support fast kNN search. You can enable indexing through the
|
|
|
+`index` parameter:
|
|
|
+
|
|
|
+[source,console]
|
|
|
+--------------------------------------------------
|
|
|
+PUT my-index-000002
|
|
|
+{
|
|
|
+ "mappings": {
|
|
|
+ "properties": {
|
|
|
+ "my_vector": {
|
|
|
+ "type": "dense_vector",
|
|
|
+ "dims": 3,
|
|
|
+ "index": true,
|
|
|
+ "similarity": "dot_product" <1>
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+<1> When `index` is enabled, you must define the vector similarity to use in kNN search
|
|
|
+
|
|
|
+{es} uses the https://arxiv.org/abs/1603.09320[HNSW algorithm] to
|
|
|
+support efficient kNN search. Like most kNN algorithms, HNSW is an approximate
|
|
|
+method that sacrifices result accuracy for improved speed.
|
|
|
+
|
|
|
+NOTE: Indexing vectors for approximate kNN search is an expensive process. It can take
|
|
|
+substantial time to ingest documents that contain vector fields with `index`
|
|
|
+enabled.
|
|
|
+
|
|
|
+[role="child_attributes"]
|
|
|
+[[dense-vector-params]]
|
|
|
+==== Parameters for dense vector fields
|
|
|
+
|
|
|
+The following mapping parameters are accepted:
|
|
|
+
|
|
|
+`dims`::
|
|
|
+(Required, integer)
|
|
|
+Number of vector dimensions. Can't exceed `2048`.
|
|
|
+
|
|
|
+`index`::
|
|
|
+(Optional, Boolean)
|
|
|
+If `true`, you can search this field using the kNN search API. Defaults to
|
|
|
+`false`.
|
|
|
+
|
|
|
+`similarity`::
|
|
|
+(Required^*^, string)
|
|
|
+The vector similarity metric to use in kNN search. Documents are ranked by
|
|
|
+their vector field's similarity to the query vector. The `_score` of each
|
|
|
+document will be derived from the similarity, in a way that ensures scores are
|
|
|
+positive and that a larger score corresponds to a higher ranking.
|
|
|
++
|
|
|
+^*^ If `index` is `true`, this parameter is required.
|
|
|
++
|
|
|
+.Valid values for `similarity`
|
|
|
+[%collapsible%open]
|
|
|
+====
|
|
|
+`l2_norm`:::
|
|
|
+Computes similarity based on the L^2^ distance (also known as Euclidean
|
|
|
+distance) between the vectors. The document `_score` is computed as
|
|
|
+`1 / (1 + l2_norm(query, vector)^2)`.`
|
|
|
+
|
|
|
+`dot_product`:::
|
|
|
+Computes the dot product of two vectors. This option provides an optimized way
|
|
|
+to perform cosine similarity. In order to use it, all vectors must be of unit
|
|
|
+length, including both document and query vectors. The document `_score` is
|
|
|
+computed as `(1 + dot_product(query, vector)) / 2`.
|
|
|
+
|
|
|
+`cosine`:::
|
|
|
+Computes the cosine similarity. Note that the most efficient way to perform
|
|
|
+cosine similarity is to normalize all vectors to unit length, and instead use
|
|
|
+`dot_product`. You should only use `cosine` if you need to preserve the
|
|
|
+original vectors and cannot normalize them in advance. The document `_score`
|
|
|
+is computed as `(1 + cosine(query, vector)) / 2`.
|
|
|
+====
|
|
|
+
|
|
|
+NOTE: Although they are conceptually related, the `similarity` parameter is
|
|
|
+different from <<text,`text`>> field <<similarity,`similarity`>> and accepts
|
|
|
+a distinct set of options.
|
|
|
+
|
|
|
+`index_options`::
|
|
|
+(Optional, object)
|
|
|
+An optional section that configures the kNN indexing algorithm. The HNSW
|
|
|
+algorithm has two internal parameters that influence how the data structure is
|
|
|
+built. These can be adjusted to improve the accuracy of results, at the
|
|
|
+expense of slower indexing speed. When `index_options` is provided, all of its
|
|
|
+properties must be defined.
|
|
|
++
|
|
|
+.Properties of `index_options`
|
|
|
+[%collapsible%open]
|
|
|
+====
|
|
|
+`type`:::
|
|
|
+(Required, string)
|
|
|
+The type of kNN algorithm to use. Currently only `hnsw` is supported.
|
|
|
+
|
|
|
+`m`:::
|
|
|
+(Required, integer)
|
|
|
+The number of neighbors each node will be connected to in the HNSW graph.
|
|
|
+Defaults to `16`.
|
|
|
+
|
|
|
+`ef_construction`:::
|
|
|
+(Required, integer)
|
|
|
+The number of candidates to track while assembling the list of nearest
|
|
|
+neighbors for each new node. Defaults to `100`.
|
|
|
+====
|