vector-functions.asciidoc 8.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[vector-functions]]
  4. ===== Functions for vector fields
  5. experimental[]
  6. These functions are used for
  7. for <<dense-vector,`dense_vector`>> and
  8. <<sparse-vector,`sparse_vector`>> fields.
  9. NOTE: During vector functions' calculation, all matched documents are
  10. linearly scanned. Thus, expect the query time grow linearly
  11. with the number of matched documents. For this reason, we recommend
  12. to limit the number of matched documents with a `query` parameter.
  13. Let's create an index with the following mapping and index a couple
  14. of documents into it.
  15. [source,js]
  16. --------------------------------------------------
  17. PUT my_index
  18. {
  19. "mappings": {
  20. "properties": {
  21. "my_dense_vector": {
  22. "type": "dense_vector",
  23. "dims": 3
  24. },
  25. "my_sparse_vector" : {
  26. "type" : "sparse_vector"
  27. },
  28. "status" : {
  29. "type" : "keyword"
  30. }
  31. }
  32. }
  33. }
  34. PUT my_index/_doc/1
  35. {
  36. "my_dense_vector": [0.5, 10, 6],
  37. "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},
  38. "status" : "published"
  39. }
  40. PUT my_index/_doc/2
  41. {
  42. "my_dense_vector": [-0.5, 10, 10],
  43. "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},
  44. "status" : "published"
  45. }
  46. --------------------------------------------------
  47. // CONSOLE
  48. // TESTSETUP
  49. For dense_vector fields, `cosineSimilarity` calculates the measure of
  50. cosine similarity between a given query vector and document vectors.
  51. [source,js]
  52. --------------------------------------------------
  53. GET my_index/_search
  54. {
  55. "query": {
  56. "script_score": {
  57. "query" : {
  58. "bool" : {
  59. "filter" : {
  60. "term" : {
  61. "status" : "published" <1>
  62. }
  63. }
  64. }
  65. },
  66. "script": {
  67. "source": "cosineSimilarity(params.query_vector, doc['my_dense_vector']) + 1.0", <2>
  68. "params": {
  69. "query_vector": [4, 3.4, -0.2] <3>
  70. }
  71. }
  72. }
  73. }
  74. }
  75. --------------------------------------------------
  76. // CONSOLE
  77. <1> To restrict the number of documents on which script score calculation is applied, provide a filter.
  78. <2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
  79. <3> To take advantage of the script optimizations, provide a query vector as a script parameter.
  80. NOTE: If a document's dense vector field has a number of dimensions
  81. different from the query's vector, an error will be thrown.
  82. Similarly, for sparse_vector fields, `cosineSimilaritySparse` calculates cosine similarity
  83. between a given query vector and document vectors.
  84. [source,js]
  85. --------------------------------------------------
  86. GET my_index/_search
  87. {
  88. "query": {
  89. "script_score": {
  90. "query" : {
  91. "bool" : {
  92. "filter" : {
  93. "term" : {
  94. "status" : "published"
  95. }
  96. }
  97. }
  98. },
  99. "script": {
  100. "source": "cosineSimilaritySparse(params.query_vector, doc['my_sparse_vector']) + 1.0",
  101. "params": {
  102. "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
  103. }
  104. }
  105. }
  106. }
  107. }
  108. --------------------------------------------------
  109. // CONSOLE
  110. For dense_vector fields, `dotProduct` calculates the measure of
  111. dot product between a given query vector and document vectors.
  112. [source,js]
  113. --------------------------------------------------
  114. GET my_index/_search
  115. {
  116. "query": {
  117. "script_score": {
  118. "query" : {
  119. "bool" : {
  120. "filter" : {
  121. "term" : {
  122. "status" : "published"
  123. }
  124. }
  125. }
  126. },
  127. "script": {
  128. "source": """
  129. double value = dotProduct(params.query_vector, doc['my_dense_vector']);
  130. return sigmoid(1, Math.E, -value); <1>
  131. """,
  132. "params": {
  133. "query_vector": [4, 3.4, -0.2]
  134. }
  135. }
  136. }
  137. }
  138. }
  139. --------------------------------------------------
  140. // CONSOLE
  141. <1> Using the standard sigmoid function prevents scores from being negative.
  142. Similarly, for sparse_vector fields, `dotProductSparse` calculates dot product
  143. between a given query vector and document vectors.
  144. [source,js]
  145. --------------------------------------------------
  146. GET my_index/_search
  147. {
  148. "query": {
  149. "script_score": {
  150. "query" : {
  151. "bool" : {
  152. "filter" : {
  153. "term" : {
  154. "status" : "published"
  155. }
  156. }
  157. }
  158. },
  159. "script": {
  160. "source": """
  161. double value = dotProductSparse(params.query_vector, doc['my_sparse_vector']);
  162. return sigmoid(1, Math.E, -value);
  163. """,
  164. "params": {
  165. "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
  166. }
  167. }
  168. }
  169. }
  170. }
  171. --------------------------------------------------
  172. // CONSOLE
  173. For dense_vector fields, `l1norm` calculates L^1^ distance
  174. (Manhattan distance) between a given query vector and
  175. document vectors.
  176. [source,js]
  177. --------------------------------------------------
  178. GET my_index/_search
  179. {
  180. "query": {
  181. "script_score": {
  182. "query" : {
  183. "bool" : {
  184. "filter" : {
  185. "term" : {
  186. "status" : "published"
  187. }
  188. }
  189. }
  190. },
  191. "script": {
  192. "source": "1 / (1 + l1norm(params.queryVector, doc['my_dense_vector']))", <1>
  193. "params": {
  194. "queryVector": [4, 3.4, -0.2]
  195. }
  196. }
  197. }
  198. }
  199. }
  200. --------------------------------------------------
  201. // CONSOLE
  202. <1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
  203. `l2norm` shown below represent distances or differences. This means, that
  204. the more similar the vectors are, the lower the scores will be that are
  205. produced by the `l1norm` and `l2norm` functions.
  206. Thus, as we need more similar vectors to score higher,
  207. we reversed the output from `l1norm` and `l2norm`. Also, to avoid
  208. division by 0 when a document vector matches the query exactly,
  209. we added `1` in the denominator.
  210. For sparse_vector fields, `l1normSparse` calculates L^1^ distance
  211. between a given query vector and document vectors.
  212. [source,js]
  213. --------------------------------------------------
  214. GET my_index/_search
  215. {
  216. "query": {
  217. "script_score": {
  218. "query" : {
  219. "bool" : {
  220. "filter" : {
  221. "term" : {
  222. "status" : "published"
  223. }
  224. }
  225. }
  226. },
  227. "script": {
  228. "source": "1 / (1 + l1normSparse(params.queryVector, doc['my_sparse_vector']))",
  229. "params": {
  230. "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
  231. }
  232. }
  233. }
  234. }
  235. }
  236. --------------------------------------------------
  237. // CONSOLE
  238. For dense_vector fields, `l2norm` calculates L^2^ distance
  239. (Euclidean distance) between a given query vector and
  240. document vectors.
  241. [source,js]
  242. --------------------------------------------------
  243. GET my_index/_search
  244. {
  245. "query": {
  246. "script_score": {
  247. "query" : {
  248. "bool" : {
  249. "filter" : {
  250. "term" : {
  251. "status" : "published"
  252. }
  253. }
  254. }
  255. },
  256. "script": {
  257. "source": "1 / (1 + l2norm(params.queryVector, doc['my_dense_vector']))",
  258. "params": {
  259. "queryVector": [4, 3.4, -0.2]
  260. }
  261. }
  262. }
  263. }
  264. }
  265. --------------------------------------------------
  266. // CONSOLE
  267. Similarly, for sparse_vector fields, `l2normSparse` calculates L^2^ distance
  268. between a given query vector and document vectors.
  269. [source,js]
  270. --------------------------------------------------
  271. GET my_index/_search
  272. {
  273. "query": {
  274. "script_score": {
  275. "query" : {
  276. "bool" : {
  277. "filter" : {
  278. "term" : {
  279. "status" : "published"
  280. }
  281. }
  282. }
  283. },
  284. "script": {
  285. "source": "1 / (1 + l2normSparse(params.queryVector, doc['my_sparse_vector']))",
  286. "params": {
  287. "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
  288. }
  289. }
  290. }
  291. }
  292. }
  293. --------------------------------------------------
  294. // CONSOLE
  295. NOTE: If a document doesn't have a value for a vector field on which
  296. a vector function is executed, an error will be thrown.
  297. You can check if a document has a value for the field `my_vector` by
  298. `doc['my_vector'].size() == 0`. Your overall script can look like this:
  299. [source,js]
  300. --------------------------------------------------
  301. "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, doc['my_vector'])"
  302. --------------------------------------------------
  303. // NOTCONSOLE