| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255 | [role="xpack"][testenv="basic"][[vector-functions]]===== Functions for vector fieldsNOTE: During vector functions' calculation, all matched documents arelinearly scanned. Thus, expect the query time grow linearlywith the number of matched documents. For this reason, we recommendto limit the number of matched documents with a `query` parameter.This is the list of available vector functions and vector access methods:1. `cosineSimilarity` – calculates cosine similarity2. `dotProduct` – calculates dot product3. `l1norm` – calculates L^1^ distance4. `l2norm` - calculates L^2^ distance5. `doc[<field>].vectorValue` – returns a vector's value as an array of floats6. `doc[<field>].magnitude` – returns a vector's magnitudeLet's create an index with a `dense_vector` mapping and index a coupleof documents into it.[source,console]--------------------------------------------------PUT my-index-000001{  "mappings": {    "properties": {      "my_dense_vector": {        "type": "dense_vector",        "dims": 3      },      "status" : {        "type" : "keyword"      }    }  }}PUT my-index-000001/_doc/1{  "my_dense_vector": [0.5, 10, 6],  "status" : "published"}PUT my-index-000001/_doc/2{  "my_dense_vector": [-0.5, 10, 10],  "status" : "published"}POST my-index-000001/_refresh--------------------------------------------------// TESTSETUPThe `cosineSimilarity` function calculates the measure ofcosine similarity between a given query vector and document vectors.[source,console]--------------------------------------------------GET my-index-000001/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published" <1>            }          }        }      },      "script": {        "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", <2>        "params": {          "query_vector": [4, 3.4, -0.2]  <3>        }      }    }  }}--------------------------------------------------<1> To restrict the number of documents on which script score calculation is applied, provide a filter.<2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.<3> To take advantage of the script optimizations, provide a query vector as a script parameter.NOTE: If a document's dense vector field has a number of dimensionsdifferent from the query's vector, an error will be thrown.The `dotProduct` function calculates the measure ofdot product between a given query vector and document vectors.[source,console]--------------------------------------------------GET my-index-000001/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": """          double value = dotProduct(params.query_vector, 'my_dense_vector');          return sigmoid(1, Math.E, -value); <1>        """,        "params": {          "query_vector": [4, 3.4, -0.2]        }      }    }  }}--------------------------------------------------<1> Using the standard sigmoid function prevents scores from being negative.The `l1norm` function calculates L^1^ distance(Manhattan distance) between a given query vector anddocument vectors.[source,console]--------------------------------------------------GET my-index-000001/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", <1>        "params": {          "queryVector": [4, 3.4, -0.2]        }      }    }  }}--------------------------------------------------<1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and`l2norm` shown below represent distances or differences. This means, thatthe more similar the vectors are, the lower the scores will be that areproduced by the `l1norm` and `l2norm` functions.Thus, as we need more similar vectors to score higher,we reversed the output from `l1norm` and `l2norm`. Also, to avoiddivision by 0 when a document vector matches the query exactly,we added `1` in the denominator.The `l2norm` function calculates L^2^ distance(Euclidean distance) between a given query vector anddocument vectors.[source,console]--------------------------------------------------GET my-index-000001/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",        "params": {          "queryVector": [4, 3.4, -0.2]        }      }    }  }}--------------------------------------------------NOTE: If a document doesn't have a value for a vector field on whicha vector function is executed, an error will be thrown.You can check if a document has a value for the field `my_vector` by`doc['my_vector'].size() == 0`. Your overall script can look like this:[source,js]--------------------------------------------------"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"--------------------------------------------------// NOTCONSOLEThe recommended way to access dense vectors is through `cosineSimilarity`,`dotProduct`, `l1norm` or `l2norm` functions. But for custom use cases,you can access dense vectors's values directly through the following functions:- `doc[<field>].vectorValue` – returns a vector's value as an array of floats- `doc[<field>].magnitude` – returns a vector's magnitude as a float(for vectors created prior to version 7.5 the magnitude is not stored.So this function calculates it anew every time it is called).For example, the script below implements a cosine similarity using thesetwo functions:[source,console]--------------------------------------------------GET my-index-000001/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": """          float[] v = doc['my_dense_vector'].vectorValue;          float vm = doc['my_dense_vector'].magnitude;          float dotProduct = 0;          for (int i = 0; i < v.length; i++) {            dotProduct += v[i] * params.queryVector[i];          }          return dotProduct / (vm * (float) params.queryVectorMag);        """,        "params": {          "queryVector": [4, 3.4, -0.2],          "queryVectorMag": 5.25357        }      }    }  }}--------------------------------------------------
 |