| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325 | [role="xpack"][testenv="basic"][[vector-functions]]===== Functions for vector fieldsexperimental[]These functions are used forfor <<dense-vector,`dense_vector`>>  and<<sparse-vector,`sparse_vector`>> fields.NOTE: During vector functions' calculation, all matched documents arelinearly scanned. Thus, expect the query time grow linearly with the number of matched documents. For this reason, we recommendto limit the number of matched documents with a `query` parameter.Let's create an index with the following mapping and index a coupleof documents into it.[source,console]--------------------------------------------------PUT my_index{  "mappings": {    "properties": {      "my_dense_vector": {        "type": "dense_vector",        "dims": 3      },      "my_sparse_vector" : {        "type" : "sparse_vector"      },      "status" : {        "type" : "keyword"      }    }  }}PUT my_index/_doc/1{  "my_dense_vector": [0.5, 10, 6],  "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},  "status" : "published"}PUT my_index/_doc/2{  "my_dense_vector": [-0.5, 10, 10],  "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},  "status" : "published"}--------------------------------------------------// TESTSETUPFor dense_vector fields, `cosineSimilarity` calculates the measure ofcosine similarity between a given query vector and document vectors.[source,console]--------------------------------------------------GET my_index/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published" <1>            }          }        }      },      "script": {        "source": "cosineSimilarity(params.query_vector, doc['my_dense_vector']) + 1.0", <2>        "params": {          "query_vector": [4, 3.4, -0.2]  <3>        }      }    }  }}--------------------------------------------------<1> To restrict the number of documents on which script score calculation is applied, provide a filter.<2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.<3> To take advantage of the script optimizations, provide a query vector as a script parameter.NOTE: If a document's dense vector field has a number of dimensionsdifferent from the query's vector, an error will be thrown.Similarly, for sparse_vector fields, `cosineSimilaritySparse` calculates cosine similaritybetween a given query vector and document vectors.[source,console]--------------------------------------------------GET my_index/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector']) + 1.0",        "params": {          "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}        }      }    }  }}--------------------------------------------------For dense_vector fields, `dotProduct` calculates the measure ofdot product between a given query vector and document vectors.[source,console]--------------------------------------------------GET my_index/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": """          double value = dotProduct(params.query_vector, doc['my_dense_vector']);          return sigmoid(1, Math.E, -value); <1>        """,        "params": {          "query_vector": [4, 3.4, -0.2]        }      }    }  }}--------------------------------------------------<1> Using the standard sigmoid function prevents scores from being negative.Similarly, for sparse_vector fields, `dotProductSparse` calculates dot productbetween a given query vector and document vectors.[source,console]--------------------------------------------------GET my_index/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": """          double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']);          return sigmoid(1, Math.E, -value);        """,         "params": {          "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}        }      }    }  }}--------------------------------------------------For dense_vector fields, `l1norm` calculates L^1^ distance(Manhattan distance) between a given query vector anddocument vectors.[source,console]--------------------------------------------------GET my_index/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": "1 / (1 + l1norm(params.queryVector, doc['my_dense_vector']))", <1>        "params": {          "queryVector": [4, 3.4, -0.2]        }      }    }  }}--------------------------------------------------<1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and`l2norm` shown below represent distances or differences. This means, thatthe more similar the vectors are, the lower the scores will be that areproduced by the `l1norm` and `l2norm` functions.Thus, as we need more similar vectors to score higher,we reversed the output from `l1norm` and `l2norm`. Also, to avoiddivision by 0 when a document vector matches the query exactly,we added `1` in the denominator.For sparse_vector fields, `l1normSparse` calculates L^1^ distancebetween a given query vector and document vectors.[source,console]--------------------------------------------------GET my_index/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": "1 / (1 + l1normSparse(params.queryVector, doc['my_sparse_vector']))",        "params": {          "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}        }      }    }  }}--------------------------------------------------For dense_vector fields, `l2norm` calculates L^2^ distance(Euclidean distance) between a given query vector anddocument vectors.[source,console]--------------------------------------------------GET my_index/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": "1 / (1 + l2norm(params.queryVector, doc['my_dense_vector']))",        "params": {          "queryVector": [4, 3.4, -0.2]        }      }    }  }}--------------------------------------------------Similarly, for sparse_vector fields, `l2normSparse` calculates L^2^ distancebetween a given query vector and document vectors.[source,console]--------------------------------------------------GET my_index/_search{  "query": {    "script_score": {      "query" : {        "bool" : {          "filter" : {            "term" : {              "status" : "published"            }          }        }      },      "script": {        "source": "1 / (1 + l2normSparse(params.queryVector, doc['my_sparse_vector']))",        "params": {          "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}        }      }    }  }}--------------------------------------------------NOTE: If a document doesn't have a value for a vector field on whicha vector function is executed, an error will be thrown.You can check if a document has a value for the field `my_vector` by`doc['my_vector'].size() == 0`. Your overall script can look like this:[source,js]--------------------------------------------------"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"--------------------------------------------------// NOTCONSOLE
 |