vector-functions.asciidoc 6.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[vector-functions]]
  4. ===== Functions for vector fields
  5. NOTE: During vector functions' calculation, all matched documents are
  6. linearly scanned. Thus, expect the query time grow linearly
  7. with the number of matched documents. For this reason, we recommend
  8. to limit the number of matched documents with a `query` parameter.
  9. This is the list of available vector functions and vector access methods:
  10. 1. `cosineSimilarity` – calculates cosine similarity
  11. 2. `dotProduct` – calculates dot product
  12. 3. `l1norm` – calculates L^1^ distance
  13. 4. `l2norm` - calculates L^2^ distance
  14. 5. `doc[<field>].vectorValue` – returns a vector's value as an array of floats
  15. 6. `doc[<field>].magnitude` – returns a vector's magnitude
  16. Let's create an index with a `dense_vector` mapping and index a couple
  17. of documents into it.
  18. [source,console]
  19. --------------------------------------------------
  20. PUT my-index-000001
  21. {
  22. "mappings": {
  23. "properties": {
  24. "my_dense_vector": {
  25. "type": "dense_vector",
  26. "dims": 3
  27. },
  28. "status" : {
  29. "type" : "keyword"
  30. }
  31. }
  32. }
  33. }
  34. PUT my-index-000001/_doc/1
  35. {
  36. "my_dense_vector": [0.5, 10, 6],
  37. "status" : "published"
  38. }
  39. PUT my-index-000001/_doc/2
  40. {
  41. "my_dense_vector": [-0.5, 10, 10],
  42. "status" : "published"
  43. }
  44. POST my-index-000001/_refresh
  45. --------------------------------------------------
  46. // TESTSETUP
  47. The `cosineSimilarity` function calculates the measure of
  48. cosine similarity between a given query vector and document vectors.
  49. [source,console]
  50. --------------------------------------------------
  51. GET my-index-000001/_search
  52. {
  53. "query": {
  54. "script_score": {
  55. "query" : {
  56. "bool" : {
  57. "filter" : {
  58. "term" : {
  59. "status" : "published" <1>
  60. }
  61. }
  62. }
  63. },
  64. "script": {
  65. "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", <2>
  66. "params": {
  67. "query_vector": [4, 3.4, -0.2] <3>
  68. }
  69. }
  70. }
  71. }
  72. }
  73. --------------------------------------------------
  74. <1> To restrict the number of documents on which script score calculation is applied, provide a filter.
  75. <2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
  76. <3> To take advantage of the script optimizations, provide a query vector as a script parameter.
  77. NOTE: If a document's dense vector field has a number of dimensions
  78. different from the query's vector, an error will be thrown.
  79. The `dotProduct` function calculates the measure of
  80. dot product between a given query vector and document vectors.
  81. [source,console]
  82. --------------------------------------------------
  83. GET my-index-000001/_search
  84. {
  85. "query": {
  86. "script_score": {
  87. "query" : {
  88. "bool" : {
  89. "filter" : {
  90. "term" : {
  91. "status" : "published"
  92. }
  93. }
  94. }
  95. },
  96. "script": {
  97. "source": """
  98. double value = dotProduct(params.query_vector, 'my_dense_vector');
  99. return sigmoid(1, Math.E, -value); <1>
  100. """,
  101. "params": {
  102. "query_vector": [4, 3.4, -0.2]
  103. }
  104. }
  105. }
  106. }
  107. }
  108. --------------------------------------------------
  109. <1> Using the standard sigmoid function prevents scores from being negative.
  110. The `l1norm` function calculates L^1^ distance
  111. (Manhattan distance) between a given query vector and
  112. document vectors.
  113. [source,console]
  114. --------------------------------------------------
  115. GET my-index-000001/_search
  116. {
  117. "query": {
  118. "script_score": {
  119. "query" : {
  120. "bool" : {
  121. "filter" : {
  122. "term" : {
  123. "status" : "published"
  124. }
  125. }
  126. }
  127. },
  128. "script": {
  129. "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", <1>
  130. "params": {
  131. "queryVector": [4, 3.4, -0.2]
  132. }
  133. }
  134. }
  135. }
  136. }
  137. --------------------------------------------------
  138. <1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
  139. `l2norm` shown below represent distances or differences. This means, that
  140. the more similar the vectors are, the lower the scores will be that are
  141. produced by the `l1norm` and `l2norm` functions.
  142. Thus, as we need more similar vectors to score higher,
  143. we reversed the output from `l1norm` and `l2norm`. Also, to avoid
  144. division by 0 when a document vector matches the query exactly,
  145. we added `1` in the denominator.
  146. The `l2norm` function calculates L^2^ distance
  147. (Euclidean distance) between a given query vector and
  148. document vectors.
  149. [source,console]
  150. --------------------------------------------------
  151. GET my-index-000001/_search
  152. {
  153. "query": {
  154. "script_score": {
  155. "query" : {
  156. "bool" : {
  157. "filter" : {
  158. "term" : {
  159. "status" : "published"
  160. }
  161. }
  162. }
  163. },
  164. "script": {
  165. "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
  166. "params": {
  167. "queryVector": [4, 3.4, -0.2]
  168. }
  169. }
  170. }
  171. }
  172. }
  173. --------------------------------------------------
  174. NOTE: If a document doesn't have a value for a vector field on which
  175. a vector function is executed, an error will be thrown.
  176. You can check if a document has a value for the field `my_vector` by
  177. `doc['my_vector'].size() == 0`. Your overall script can look like this:
  178. [source,js]
  179. --------------------------------------------------
  180. "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
  181. --------------------------------------------------
  182. // NOTCONSOLE
  183. The recommended way to access dense vectors is through `cosineSimilarity`,
  184. `dotProduct`, `l1norm` or `l2norm` functions. But for custom use cases,
  185. you can access dense vectors's values directly through the following functions:
  186. - `doc[<field>].vectorValue` – returns a vector's value as an array of floats
  187. - `doc[<field>].magnitude` – returns a vector's magnitude as a float
  188. (for vectors created prior to version 7.5 the magnitude is not stored.
  189. So this function calculates it anew every time it is called).
  190. For example, the script below implements a cosine similarity using these
  191. two functions:
  192. [source,console]
  193. --------------------------------------------------
  194. GET my-index-000001/_search
  195. {
  196. "query": {
  197. "script_score": {
  198. "query" : {
  199. "bool" : {
  200. "filter" : {
  201. "term" : {
  202. "status" : "published"
  203. }
  204. }
  205. }
  206. },
  207. "script": {
  208. "source": """
  209. float[] v = doc['my_dense_vector'].vectorValue;
  210. float vm = doc['my_dense_vector'].magnitude;
  211. float dotProduct = 0;
  212. for (int i = 0; i < v.length; i++) {
  213. dotProduct += v[i] * params.queryVector[i];
  214. }
  215. return dotProduct / (vm * (float) params.queryVectorMag);
  216. """,
  217. "params": {
  218. "queryVector": [4, 3.4, -0.2],
  219. "queryVectorMag": 5.25357
  220. }
  221. }
  222. }
  223. }
  224. }
  225. --------------------------------------------------