vector-functions.asciidoc 5.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[vector-functions]]
  4. ===== Functions for vector fields
  5. NOTE: During vector functions' calculation, all matched documents are
  6. linearly scanned. Thus, expect the query time grow linearly
  7. with the number of matched documents. For this reason, we recommend
  8. to limit the number of matched documents with a `query` parameter.
  9. Let's create an index with a `dense_vector` mapping and index a couple
  10. of documents into it.
  11. [source,console]
  12. --------------------------------------------------
  13. PUT my-index-000001
  14. {
  15. "mappings": {
  16. "properties": {
  17. "my_dense_vector": {
  18. "type": "dense_vector",
  19. "dims": 3
  20. },
  21. "status" : {
  22. "type" : "keyword"
  23. }
  24. }
  25. }
  26. }
  27. PUT my-index-000001/_doc/1
  28. {
  29. "my_dense_vector": [0.5, 10, 6],
  30. "status" : "published"
  31. }
  32. PUT my-index-000001/_doc/2
  33. {
  34. "my_dense_vector": [-0.5, 10, 10],
  35. "status" : "published"
  36. }
  37. POST my-index-000001/_refresh
  38. --------------------------------------------------
  39. // TESTSETUP
  40. The `cosineSimilarity` function calculates the measure of
  41. cosine similarity between a given query vector and document vectors.
  42. [source,console]
  43. --------------------------------------------------
  44. GET my-index-000001/_search
  45. {
  46. "query": {
  47. "script_score": {
  48. "query" : {
  49. "bool" : {
  50. "filter" : {
  51. "term" : {
  52. "status" : "published" <1>
  53. }
  54. }
  55. }
  56. },
  57. "script": {
  58. "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", <2>
  59. "params": {
  60. "query_vector": [4, 3.4, -0.2] <3>
  61. }
  62. }
  63. }
  64. }
  65. }
  66. --------------------------------------------------
  67. <1> To restrict the number of documents on which script score calculation is applied, provide a filter.
  68. <2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
  69. <3> To take advantage of the script optimizations, provide a query vector as a script parameter.
  70. NOTE: If a document's dense vector field has a number of dimensions
  71. different from the query's vector, an error will be thrown.
  72. The `dotProduct` function calculates the measure of
  73. dot product between a given query vector and document vectors.
  74. [source,console]
  75. --------------------------------------------------
  76. GET my-index-000001/_search
  77. {
  78. "query": {
  79. "script_score": {
  80. "query" : {
  81. "bool" : {
  82. "filter" : {
  83. "term" : {
  84. "status" : "published"
  85. }
  86. }
  87. }
  88. },
  89. "script": {
  90. "source": """
  91. double value = dotProduct(params.query_vector, 'my_dense_vector');
  92. return sigmoid(1, Math.E, -value); <1>
  93. """,
  94. "params": {
  95. "query_vector": [4, 3.4, -0.2]
  96. }
  97. }
  98. }
  99. }
  100. }
  101. --------------------------------------------------
  102. <1> Using the standard sigmoid function prevents scores from being negative.
  103. The `l1norm` function calculates L^1^ distance
  104. (Manhattan distance) between a given query vector and
  105. document vectors.
  106. [source,console]
  107. --------------------------------------------------
  108. GET my-index-000001/_search
  109. {
  110. "query": {
  111. "script_score": {
  112. "query" : {
  113. "bool" : {
  114. "filter" : {
  115. "term" : {
  116. "status" : "published"
  117. }
  118. }
  119. }
  120. },
  121. "script": {
  122. "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", <1>
  123. "params": {
  124. "queryVector": [4, 3.4, -0.2]
  125. }
  126. }
  127. }
  128. }
  129. }
  130. --------------------------------------------------
  131. <1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
  132. `l2norm` shown below represent distances or differences. This means, that
  133. the more similar the vectors are, the lower the scores will be that are
  134. produced by the `l1norm` and `l2norm` functions.
  135. Thus, as we need more similar vectors to score higher,
  136. we reversed the output from `l1norm` and `l2norm`. Also, to avoid
  137. division by 0 when a document vector matches the query exactly,
  138. we added `1` in the denominator.
  139. The `l2norm` function calculates L^2^ distance
  140. (Euclidean distance) between a given query vector and
  141. document vectors.
  142. [source,console]
  143. --------------------------------------------------
  144. GET my-index-000001/_search
  145. {
  146. "query": {
  147. "script_score": {
  148. "query" : {
  149. "bool" : {
  150. "filter" : {
  151. "term" : {
  152. "status" : "published"
  153. }
  154. }
  155. }
  156. },
  157. "script": {
  158. "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
  159. "params": {
  160. "queryVector": [4, 3.4, -0.2]
  161. }
  162. }
  163. }
  164. }
  165. }
  166. --------------------------------------------------
  167. NOTE: If a document doesn't have a value for a vector field on which
  168. a vector function is executed, an error will be thrown.
  169. You can check if a document has a value for the field `my_vector` by
  170. `doc['my_vector'].size() == 0`. Your overall script can look like this:
  171. [source,js]
  172. --------------------------------------------------
  173. "source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
  174. --------------------------------------------------
  175. // NOTCONSOLE