vector-functions.asciidoc 7.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277
  1. [role="xpack"]
  2. [[vector-functions]]
  3. ===== Functions for vector fields
  4. NOTE: During vector functions' calculation, all matched documents are
  5. linearly scanned. Thus, expect the query time grow linearly
  6. with the number of matched documents. For this reason, we recommend
  7. to limit the number of matched documents with a `query` parameter.
  8. This is the list of available vector functions and vector access methods:
  9. 1. <<vector-functions-cosine,`cosineSimilarity`>> – calculates cosine similarity
  10. 2. <<vector-functions-dot-product,`dotProduct`>> – calculates dot product
  11. 3. <<vector-functions-l1,`l1norm`>> – calculates L^1^ distance
  12. 4. <<vector-functions-l2,`l2norm`>> - calculates L^2^ distance
  13. 5. <<vector-functions-accessing-vectors,`doc[<field>].vectorValue`>> – returns a vector's value as an array of floats
  14. 6. <<vector-functions-accessing-vectors,`doc[<field>].magnitude`>> – returns a vector's magnitude
  15. NOTE: The recommended way to access dense vectors is through the
  16. `cosineSimilarity`, `dotProduct`, `l1norm` or `l2norm` functions. Please note
  17. however, that you should call these functions only once per script. For example,
  18. don’t use these functions in a loop to calculate the similarity between a
  19. document vector and multiple other vectors. If you need that functionality,
  20. reimplement these functions yourself by
  21. <<vector-functions-accessing-vectors,accessing vector values directly>>.
  22. Let's create an index with a `dense_vector` mapping and index a couple
  23. of documents into it.
  24. [source,console]
  25. --------------------------------------------------
  26. PUT my-index-000001
  27. {
  28. "mappings": {
  29. "properties": {
  30. "my_dense_vector": {
  31. "type": "dense_vector",
  32. "dims": 3
  33. },
  34. "status" : {
  35. "type" : "keyword"
  36. }
  37. }
  38. }
  39. }
  40. PUT my-index-000001/_doc/1
  41. {
  42. "my_dense_vector": [0.5, 10, 6],
  43. "status" : "published"
  44. }
  45. PUT my-index-000001/_doc/2
  46. {
  47. "my_dense_vector": [-0.5, 10, 10],
  48. "status" : "published"
  49. }
  50. POST my-index-000001/_refresh
  51. --------------------------------------------------
  52. // TESTSETUP
  53. [[vector-functions-cosine]]
  54. ====== Cosine similarity
  55. The `cosineSimilarity` function calculates the measure of
  56. cosine similarity between a given query vector and document vectors.
  57. [source,console]
  58. --------------------------------------------------
  59. GET my-index-000001/_search
  60. {
  61. "query": {
  62. "script_score": {
  63. "query" : {
  64. "bool" : {
  65. "filter" : {
  66. "term" : {
  67. "status" : "published" <1>
  68. }
  69. }
  70. }
  71. },
  72. "script": {
  73. "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", <2>
  74. "params": {
  75. "query_vector": [4, 3.4, -0.2] <3>
  76. }
  77. }
  78. }
  79. }
  80. }
  81. --------------------------------------------------
  82. <1> To restrict the number of documents on which script score calculation is applied, provide a filter.
  83. <2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
  84. <3> To take advantage of the script optimizations, provide a query vector as a script parameter.
  85. NOTE: If a document's dense vector field has a number of dimensions
  86. different from the query's vector, an error will be thrown.
  87. [[vector-functions-dot-product]]
  88. ====== Dot product
  89. The `dotProduct` function calculates the measure of
  90. dot product between a given query vector and document vectors.
  91. [source,console]
  92. --------------------------------------------------
  93. GET my-index-000001/_search
  94. {
  95. "query": {
  96. "script_score": {
  97. "query" : {
  98. "bool" : {
  99. "filter" : {
  100. "term" : {
  101. "status" : "published"
  102. }
  103. }
  104. }
  105. },
  106. "script": {
  107. "source": """
  108. double value = dotProduct(params.query_vector, 'my_dense_vector');
  109. return sigmoid(1, Math.E, -value); <1>
  110. """,
  111. "params": {
  112. "query_vector": [4, 3.4, -0.2]
  113. }
  114. }
  115. }
  116. }
  117. }
  118. --------------------------------------------------
  119. <1> Using the standard sigmoid function prevents scores from being negative.
  120. [[vector-functions-l1]]
  121. ====== L^1^ distance (Manhattan distance)
  122. The `l1norm` function calculates L^1^ distance
  123. (Manhattan distance) between a given query vector and
  124. document vectors.
  125. [source,console]
  126. --------------------------------------------------
  127. GET my-index-000001/_search
  128. {
  129. "query": {
  130. "script_score": {
  131. "query" : {
  132. "bool" : {
  133. "filter" : {
  134. "term" : {
  135. "status" : "published"
  136. }
  137. }
  138. }
  139. },
  140. "script": {
  141. "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", <1>
  142. "params": {
  143. "queryVector": [4, 3.4, -0.2]
  144. }
  145. }
  146. }
  147. }
  148. }
  149. --------------------------------------------------
  150. <1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
  151. `l2norm` shown below represent distances or differences. This means, that
  152. the more similar the vectors are, the lower the scores will be that are
  153. produced by the `l1norm` and `l2norm` functions.
  154. Thus, as we need more similar vectors to score higher,
  155. we reversed the output from `l1norm` and `l2norm`. Also, to avoid
  156. division by 0 when a document vector matches the query exactly,
  157. we added `1` in the denominator.
  158. [[vector-functions-l2]]
  159. ====== L^2^ distance (Euclidean distance)
  160. The `l2norm` function calculates L^2^ distance
  161. (Euclidean distance) between a given query vector and
  162. document vectors.
  163. [source,console]
  164. --------------------------------------------------
  165. GET my-index-000001/_search
  166. {
  167. "query": {
  168. "script_score": {
  169. "query" : {
  170. "bool" : {
  171. "filter" : {
  172. "term" : {
  173. "status" : "published"
  174. }
  175. }
  176. }
  177. },
  178. "script": {
  179. "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
  180. "params": {
  181. "queryVector": [4, 3.4, -0.2]
  182. }
  183. }
  184. }
  185. }
  186. }
  187. --------------------------------------------------
  188. [[vector-functions-missing-values]]
  189. ====== Checking for missing values
  190. If a document doesn't have a value for a vector field on which a vector function
  191. is executed, an error will be thrown.
  192. You can check if a document has a value for the field `my_vector` with
  193. `doc['my_vector'].size() == 0`. Your overall script can look like this:
  194. [source,js]
  195. --------------------------------------------------
  196. "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
  197. --------------------------------------------------
  198. // NOTCONSOLE
  199. [[vector-functions-accessing-vectors]]
  200. ====== Accessing vectors directly
  201. You can access vector values directly through the following functions:
  202. - `doc[<field>].vectorValue` – returns a vector's value as an array of floats
  203. - `doc[<field>].magnitude` – returns a vector's magnitude as a float
  204. (for vectors created prior to version 7.5 the magnitude is not stored.
  205. So this function calculates it anew every time it is called).
  206. For example, the script below implements a cosine similarity using these
  207. two functions:
  208. [source,console]
  209. --------------------------------------------------
  210. GET my-index-000001/_search
  211. {
  212. "query": {
  213. "script_score": {
  214. "query" : {
  215. "bool" : {
  216. "filter" : {
  217. "term" : {
  218. "status" : "published"
  219. }
  220. }
  221. }
  222. },
  223. "script": {
  224. "source": """
  225. float[] v = doc['my_dense_vector'].vectorValue;
  226. float vm = doc['my_dense_vector'].magnitude;
  227. float dotProduct = 0;
  228. for (int i = 0; i < v.length; i++) {
  229. dotProduct += v[i] * params.queryVector[i];
  230. }
  231. return dotProduct / (vm * (float) params.queryVectorMag);
  232. """,
  233. "params": {
  234. "queryVector": [4, 3.4, -0.2],
  235. "queryVectorMag": 5.25357
  236. }
  237. }
  238. }
  239. }
  240. }
  241. --------------------------------------------------