123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371 |
- [role="xpack"]
- [testenv="basic"]
- [[vector-functions]]
- ===== Functions for vector fields
- experimental[]
- NOTE: During vector functions' calculation, all matched documents are
- linearly scanned. Thus, expect the query time grow linearly
- with the number of matched documents. For this reason, we recommend
- to limit the number of matched documents with a `query` parameter.
- ====== `dense_vector` functions
- Let's create an index with a `dense_vector` mapping and index a couple
- of documents into it.
- [source,console]
- --------------------------------------------------
- PUT my_index
- {
- "mappings": {
- "properties": {
- "my_dense_vector": {
- "type": "dense_vector",
- "dims": 3
- },
- "status" : {
- "type" : "keyword"
- }
- }
- }
- }
- PUT my_index/_doc/1
- {
- "my_dense_vector": [0.5, 10, 6],
- "status" : "published"
- }
- PUT my_index/_doc/2
- {
- "my_dense_vector": [-0.5, 10, 10],
- "status" : "published"
- }
- POST my_index/_refresh
- --------------------------------------------------
- // TESTSETUP
- The `cosineSimilarity` function calculates the measure of
- cosine similarity between a given query vector and document vectors.
- [source,console]
- --------------------------------------------------
- GET my_index/_search
- {
- "query": {
- "script_score": {
- "query" : {
- "bool" : {
- "filter" : {
- "term" : {
- "status" : "published" <1>
- }
- }
- }
- },
- "script": {
- "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", <2>
- "params": {
- "query_vector": [4, 3.4, -0.2] <3>
- }
- }
- }
- }
- }
- --------------------------------------------------
- <1> To restrict the number of documents on which script score calculation is applied, provide a filter.
- <2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
- <3> To take advantage of the script optimizations, provide a query vector as a script parameter.
- NOTE: If a document's dense vector field has a number of dimensions
- different from the query's vector, an error will be thrown.
- The `dotProduct` function calculates the measure of
- dot product between a given query vector and document vectors.
- [source,console]
- --------------------------------------------------
- GET my_index/_search
- {
- "query": {
- "script_score": {
- "query" : {
- "bool" : {
- "filter" : {
- "term" : {
- "status" : "published"
- }
- }
- }
- },
- "script": {
- "source": """
- double value = dotProduct(params.query_vector, 'my_dense_vector');
- return sigmoid(1, Math.E, -value); <1>
- """,
- "params": {
- "query_vector": [4, 3.4, -0.2]
- }
- }
- }
- }
- }
- --------------------------------------------------
- <1> Using the standard sigmoid function prevents scores from being negative.
- The `l1norm` function calculates L^1^ distance
- (Manhattan distance) between a given query vector and
- document vectors.
- [source,console]
- --------------------------------------------------
- GET my_index/_search
- {
- "query": {
- "script_score": {
- "query" : {
- "bool" : {
- "filter" : {
- "term" : {
- "status" : "published"
- }
- }
- }
- },
- "script": {
- "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", <1>
- "params": {
- "queryVector": [4, 3.4, -0.2]
- }
- }
- }
- }
- }
- --------------------------------------------------
- <1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
- `l2norm` shown below represent distances or differences. This means, that
- the more similar the vectors are, the lower the scores will be that are
- produced by the `l1norm` and `l2norm` functions.
- Thus, as we need more similar vectors to score higher,
- we reversed the output from `l1norm` and `l2norm`. Also, to avoid
- division by 0 when a document vector matches the query exactly,
- we added `1` in the denominator.
- The `l2norm` function calculates L^2^ distance
- (Euclidean distance) between a given query vector and
- document vectors.
- [source,console]
- --------------------------------------------------
- GET my_index/_search
- {
- "query": {
- "script_score": {
- "query" : {
- "bool" : {
- "filter" : {
- "term" : {
- "status" : "published"
- }
- }
- }
- },
- "script": {
- "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
- "params": {
- "queryVector": [4, 3.4, -0.2]
- }
- }
- }
- }
- }
- --------------------------------------------------
- NOTE: If a document doesn't have a value for a vector field on which
- a vector function is executed, an error will be thrown.
- You can check if a document has a value for the field `my_vector` by
- `doc['my_vector'].size() == 0`. Your overall script can look like this:
- [source,js]
- --------------------------------------------------
- "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
- --------------------------------------------------
- // NOTCONSOLE
- ====== `sparse_vector` functions
- deprecated[7.6, The `sparse_vector` type is deprecated and will be removed in 8.0.]
- Let's create an index with a `sparse_vector` mapping and index a couple
- of documents into it.
- [source,console]
- --------------------------------------------------
- PUT my_sparse_index
- {
- "mappings": {
- "properties": {
- "my_sparse_vector": {
- "type": "sparse_vector"
- },
- "status" : {
- "type" : "keyword"
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
- [source,console]
- --------------------------------------------------
- PUT my_sparse_index/_doc/1
- {
- "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},
- "status" : "published"
- }
- PUT my_sparse_index/_doc/2
- {
- "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},
- "status" : "published"
- }
- POST my_sparse_index/_refresh
- --------------------------------------------------
- // TEST[continued]
- The `cosineSimilaritySparse` function calculates cosine similarity
- between a given query vector and document vectors.
- [source,console]
- --------------------------------------------------
- GET my_sparse_index/_search
- {
- "query": {
- "script_score": {
- "query" : {
- "bool" : {
- "filter" : {
- "term" : {
- "status" : "published"
- }
- }
- }
- },
- "script": {
- "source": "cosineSimilaritySparse(params.query_vector, 'my_sparse_vector') + 1.0",
- "params": {
- "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
- The `dotProductSparse` function calculates dot product
- between a given query vector and document vectors.
- [source,console]
- --------------------------------------------------
- GET my_sparse_index/_search
- {
- "query": {
- "script_score": {
- "query" : {
- "bool" : {
- "filter" : {
- "term" : {
- "status" : "published"
- }
- }
- }
- },
- "script": {
- "source": """
- double value = dotProductSparse(params.query_vector, 'my_sparse_vector');
- return sigmoid(1, Math.E, -value);
- """,
- "params": {
- "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
- The `l1normSparse` function calculates L^1^ distance
- between a given query vector and document vectors.
- [source,console]
- --------------------------------------------------
- GET my_sparse_index/_search
- {
- "query": {
- "script_score": {
- "query" : {
- "bool" : {
- "filter" : {
- "term" : {
- "status" : "published"
- }
- }
- }
- },
- "script": {
- "source": "1 / (1 + l1normSparse(params.queryVector, 'my_sparse_vector'))",
- "params": {
- "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
- The `l2normSparse` function calculates L^2^ distance
- between a given query vector and document vectors.
- [source,console]
- --------------------------------------------------
- GET my_sparse_index/_search
- {
- "query": {
- "script_score": {
- "query" : {
- "bool" : {
- "filter" : {
- "term" : {
- "status" : "published"
- }
- }
- }
- },
- "script": {
- "source": "1 / (1 + l2normSparse(params.queryVector, 'my_sparse_vector'))",
- "params": {
- "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
|