vector-functions.asciidoc 9.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[vector-functions]]
  4. ===== Functions for vector fields
  5. experimental[]
  6. NOTE: During vector functions' calculation, all matched documents are
  7. linearly scanned. Thus, expect the query time grow linearly
  8. with the number of matched documents. For this reason, we recommend
  9. to limit the number of matched documents with a `query` parameter.
  10. ====== `dense_vector` functions
  11. Let's create an index with a `dense_vector` mapping and index a couple
  12. of documents into it.
  13. [source,console]
  14. --------------------------------------------------
  15. PUT my_index
  16. {
  17. "mappings": {
  18. "properties": {
  19. "my_dense_vector": {
  20. "type": "dense_vector",
  21. "dims": 3
  22. },
  23. "status" : {
  24. "type" : "keyword"
  25. }
  26. }
  27. }
  28. }
  29. PUT my_index/_doc/1
  30. {
  31. "my_dense_vector": [0.5, 10, 6],
  32. "status" : "published"
  33. }
  34. PUT my_index/_doc/2
  35. {
  36. "my_dense_vector": [-0.5, 10, 10],
  37. "status" : "published"
  38. }
  39. POST my_index/_refresh
  40. --------------------------------------------------
  41. // TESTSETUP
  42. The `cosineSimilarity` function calculates the measure of
  43. cosine similarity between a given query vector and document vectors.
  44. [source,console]
  45. --------------------------------------------------
  46. GET my_index/_search
  47. {
  48. "query": {
  49. "script_score": {
  50. "query" : {
  51. "bool" : {
  52. "filter" : {
  53. "term" : {
  54. "status" : "published" <1>
  55. }
  56. }
  57. }
  58. },
  59. "script": {
  60. "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", <2>
  61. "params": {
  62. "query_vector": [4, 3.4, -0.2] <3>
  63. }
  64. }
  65. }
  66. }
  67. }
  68. --------------------------------------------------
  69. <1> To restrict the number of documents on which script score calculation is applied, provide a filter.
  70. <2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
  71. <3> To take advantage of the script optimizations, provide a query vector as a script parameter.
  72. NOTE: If a document's dense vector field has a number of dimensions
  73. different from the query's vector, an error will be thrown.
  74. The `dotProduct` function calculates the measure of
  75. dot product between a given query vector and document vectors.
  76. [source,console]
  77. --------------------------------------------------
  78. GET my_index/_search
  79. {
  80. "query": {
  81. "script_score": {
  82. "query" : {
  83. "bool" : {
  84. "filter" : {
  85. "term" : {
  86. "status" : "published"
  87. }
  88. }
  89. }
  90. },
  91. "script": {
  92. "source": """
  93. double value = dotProduct(params.query_vector, 'my_dense_vector');
  94. return sigmoid(1, Math.E, -value); <1>
  95. """,
  96. "params": {
  97. "query_vector": [4, 3.4, -0.2]
  98. }
  99. }
  100. }
  101. }
  102. }
  103. --------------------------------------------------
  104. <1> Using the standard sigmoid function prevents scores from being negative.
  105. The `l1norm` function calculates L^1^ distance
  106. (Manhattan distance) between a given query vector and
  107. document vectors.
  108. [source,console]
  109. --------------------------------------------------
  110. GET my_index/_search
  111. {
  112. "query": {
  113. "script_score": {
  114. "query" : {
  115. "bool" : {
  116. "filter" : {
  117. "term" : {
  118. "status" : "published"
  119. }
  120. }
  121. }
  122. },
  123. "script": {
  124. "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", <1>
  125. "params": {
  126. "queryVector": [4, 3.4, -0.2]
  127. }
  128. }
  129. }
  130. }
  131. }
  132. --------------------------------------------------
  133. <1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
  134. `l2norm` shown below represent distances or differences. This means, that
  135. the more similar the vectors are, the lower the scores will be that are
  136. produced by the `l1norm` and `l2norm` functions.
  137. Thus, as we need more similar vectors to score higher,
  138. we reversed the output from `l1norm` and `l2norm`. Also, to avoid
  139. division by 0 when a document vector matches the query exactly,
  140. we added `1` in the denominator.
  141. The `l2norm` function calculates L^2^ distance
  142. (Euclidean distance) between a given query vector and
  143. document vectors.
  144. [source,console]
  145. --------------------------------------------------
  146. GET my_index/_search
  147. {
  148. "query": {
  149. "script_score": {
  150. "query" : {
  151. "bool" : {
  152. "filter" : {
  153. "term" : {
  154. "status" : "published"
  155. }
  156. }
  157. }
  158. },
  159. "script": {
  160. "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
  161. "params": {
  162. "queryVector": [4, 3.4, -0.2]
  163. }
  164. }
  165. }
  166. }
  167. }
  168. --------------------------------------------------
  169. NOTE: If a document doesn't have a value for a vector field on which
  170. a vector function is executed, an error will be thrown.
  171. You can check if a document has a value for the field `my_vector` by
  172. `doc['my_vector'].size() == 0`. Your overall script can look like this:
  173. [source,js]
  174. --------------------------------------------------
  175. "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
  176. --------------------------------------------------
  177. // NOTCONSOLE
  178. ====== `sparse_vector` functions
  179. deprecated[7.6, The `sparse_vector` type is deprecated and will be removed in 8.0.]
  180. Let's create an index with a `sparse_vector` mapping and index a couple
  181. of documents into it.
  182. [source,console]
  183. --------------------------------------------------
  184. PUT my_sparse_index
  185. {
  186. "mappings": {
  187. "properties": {
  188. "my_sparse_vector": {
  189. "type": "sparse_vector"
  190. },
  191. "status" : {
  192. "type" : "keyword"
  193. }
  194. }
  195. }
  196. }
  197. --------------------------------------------------
  198. // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
  199. [source,console]
  200. --------------------------------------------------
  201. PUT my_sparse_index/_doc/1
  202. {
  203. "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},
  204. "status" : "published"
  205. }
  206. PUT my_sparse_index/_doc/2
  207. {
  208. "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},
  209. "status" : "published"
  210. }
  211. POST my_sparse_index/_refresh
  212. --------------------------------------------------
  213. // TEST[continued]
  214. The `cosineSimilaritySparse` function calculates cosine similarity
  215. between a given query vector and document vectors.
  216. [source,console]
  217. --------------------------------------------------
  218. GET my_sparse_index/_search
  219. {
  220. "query": {
  221. "script_score": {
  222. "query" : {
  223. "bool" : {
  224. "filter" : {
  225. "term" : {
  226. "status" : "published"
  227. }
  228. }
  229. }
  230. },
  231. "script": {
  232. "source": "cosineSimilaritySparse(params.query_vector, 'my_sparse_vector') + 1.0",
  233. "params": {
  234. "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
  235. }
  236. }
  237. }
  238. }
  239. }
  240. --------------------------------------------------
  241. // TEST[continued]
  242. // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
  243. The `dotProductSparse` function calculates dot product
  244. between a given query vector and document vectors.
  245. [source,console]
  246. --------------------------------------------------
  247. GET my_sparse_index/_search
  248. {
  249. "query": {
  250. "script_score": {
  251. "query" : {
  252. "bool" : {
  253. "filter" : {
  254. "term" : {
  255. "status" : "published"
  256. }
  257. }
  258. }
  259. },
  260. "script": {
  261. "source": """
  262. double value = dotProductSparse(params.query_vector, 'my_sparse_vector');
  263. return sigmoid(1, Math.E, -value);
  264. """,
  265. "params": {
  266. "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
  267. }
  268. }
  269. }
  270. }
  271. }
  272. --------------------------------------------------
  273. // TEST[continued]
  274. // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
  275. The `l1normSparse` function calculates L^1^ distance
  276. between a given query vector and document vectors.
  277. [source,console]
  278. --------------------------------------------------
  279. GET my_sparse_index/_search
  280. {
  281. "query": {
  282. "script_score": {
  283. "query" : {
  284. "bool" : {
  285. "filter" : {
  286. "term" : {
  287. "status" : "published"
  288. }
  289. }
  290. }
  291. },
  292. "script": {
  293. "source": "1 / (1 + l1normSparse(params.queryVector, 'my_sparse_vector'))",
  294. "params": {
  295. "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
  296. }
  297. }
  298. }
  299. }
  300. }
  301. --------------------------------------------------
  302. // TEST[continued]
  303. // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]
  304. The `l2normSparse` function calculates L^2^ distance
  305. between a given query vector and document vectors.
  306. [source,console]
  307. --------------------------------------------------
  308. GET my_sparse_index/_search
  309. {
  310. "query": {
  311. "script_score": {
  312. "query" : {
  313. "bool" : {
  314. "filter" : {
  315. "term" : {
  316. "status" : "published"
  317. }
  318. }
  319. }
  320. },
  321. "script": {
  322. "source": "1 / (1 + l2normSparse(params.queryVector, 'my_sparse_vector'))",
  323. "params": {
  324. "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
  325. }
  326. }
  327. }
  328. }
  329. }
  330. --------------------------------------------------
  331. // TEST[continued]
  332. // TEST[warning:The [sparse_vector] field type is deprecated and will be removed in 8.0.]