rank-feature-query.asciidoc 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320
  1. [[query-dsl-rank-feature-query]]
  2. === Rank feature query
  3. ++++
  4. <titleabbrev>Rank feature</titleabbrev>
  5. ++++
  6. Boosts the <<relevance-scores,relevance score>> of documents based on the
  7. numeric value of a <<rank-feature,`rank_feature`>> or
  8. <<rank-features,`rank_features`>> field.
  9. The `rank_feature` query is typically used in the `should` clause of a
  10. <<query-dsl-bool-query,`bool`>> query so its relevance scores are added to other
  11. scores from the `bool` query.
  12. Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
  13. ways to change <<relevance-scores,relevance scores>>, the
  14. `rank_feature` query efficiently skips non-competitive hits when the
  15. <<search-uri-request,`track_total_hits`>> parameter is **not** `true`. This can
  16. dramatically improve query speed.
  17. [[rank-feature-query-functions]]
  18. ==== Rank feature functions
  19. To calculate relevance scores based on rank feature fields, the `rank_feature`
  20. query supports the following mathematical functions:
  21. * <<rank-feature-query-saturation,Saturation>>
  22. * <<rank-feature-query-logarithm,Logarithm>>
  23. * <<rank-feature-query-sigmoid,Sigmoid>>
  24. If you don't know where to start, we recommend using the `saturation` function.
  25. If no function is provided, the `rank_feature` query uses the `saturation`
  26. function by default.
  27. [[rank-feature-query-ex-request]]
  28. ==== Example request
  29. [[rank-feature-query-index-setup]]
  30. ===== Index setup
  31. To use the `rank_feature` query, your index must include a
  32. <<rank-feature,`rank_feature`>> or <<rank-features,`rank_features`>> field
  33. mapping. To see how you can set up an index for the `rank_feature` query, try
  34. the following example.
  35. Create a `test` index with the following field mappings:
  36. - `pagerank`, a <<rank-feature,`rank_feature`>> field which measures the
  37. importance of a website
  38. - `url_length`, a <<rank-feature,`rank_feature`>> field which contains the
  39. length of the website's URL. For this example, a long URL correlates negatively
  40. to relevance, indicated by a `positive_score_impact` value of `false`.
  41. - `topics`, a <<rank-features,`rank_features`>> field which contains a list of
  42. topics and a measure of how well each document is connected to this topic
  43. [source,js]
  44. ----
  45. PUT /test
  46. {
  47. "mappings": {
  48. "properties": {
  49. "pagerank": {
  50. "type": "rank_feature"
  51. },
  52. "url_length": {
  53. "type": "rank_feature",
  54. "positive_score_impact": false
  55. },
  56. "topics": {
  57. "type": "rank_features"
  58. }
  59. }
  60. }
  61. }
  62. ----
  63. // CONSOLE
  64. // TESTSETUP
  65. Index several documents to the `test` index.
  66. [source,js]
  67. ----
  68. PUT /test/_doc/1?refresh
  69. {
  70. "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
  71. "content": "Rio 2016",
  72. "pagerank": 50.3,
  73. "url_length": 42,
  74. "topics": {
  75. "sports": 50,
  76. "brazil": 30
  77. }
  78. }
  79. PUT /test/_doc/2?refresh
  80. {
  81. "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  82. "content": "Formula One motor race held on 13 November 2016",
  83. "pagerank": 50.3,
  84. "url_length": 47,
  85. "topics": {
  86. "sports": 35,
  87. "formula one": 65,
  88. "brazil": 20
  89. }
  90. }
  91. PUT /test/_doc/3?refresh
  92. {
  93. "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
  94. "content": "Deadpool is a 2016 American superhero film",
  95. "pagerank": 50.3,
  96. "url_length": 37,
  97. "topics": {
  98. "movies": 60,
  99. "super hero": 65
  100. }
  101. }
  102. ----
  103. // CONSOLE
  104. [[rank-feature-query-ex-query]]
  105. ===== Example query
  106. The following query searches for `2016` and boosts relevance scores based or
  107. `pagerank`, `url_length`, and the `sports` topic.
  108. [source,js]
  109. ----
  110. GET /test/_search
  111. {
  112. "query": {
  113. "bool": {
  114. "must": [
  115. {
  116. "match": {
  117. "content": "2016"
  118. }
  119. }
  120. ],
  121. "should": [
  122. {
  123. "rank_feature": {
  124. "field": "pagerank"
  125. }
  126. },
  127. {
  128. "rank_feature": {
  129. "field": "url_length",
  130. "boost": 0.1
  131. }
  132. },
  133. {
  134. "rank_feature": {
  135. "field": "topics.sports",
  136. "boost": 0.4
  137. }
  138. }
  139. ]
  140. }
  141. }
  142. }
  143. ----
  144. // CONSOLE
  145. [[rank-feature-top-level-params]]
  146. ==== Top-level parameters for `rank_feature`
  147. `field`::
  148. (Required, string) <<rank-feature,`rank_feature`>> or
  149. <<rank-features,`rank_features`>> field used to boost
  150. <<relevance-scores,relevance scores>>.
  151. `boost`::
  152. +
  153. --
  154. (Optional, float) Floating point number used to decrease or increase
  155. <<relevance-scores,relevance scores>>. Defaults to `1.0`.
  156. Boost values are relative to the default value of `1.0`. A boost value between
  157. `0` and `1.0` decreases the relevance score. A value greater than `1.0`
  158. increases the relevance score.
  159. --
  160. `saturation`::
  161. +
  162. --
  163. (Optional, <<rank-feature-query-saturation,function object>>) Saturation
  164. function used to boost <<relevance-scores,relevance scores>> based on the
  165. value of the rank feature `field`. If no function is provided, the `rank_feature`
  166. query defaults to the `saturation` function. See
  167. <<rank-feature-query-saturation,Saturation>> for more information.
  168. Only one function `saturation`, `log`, or `sigmoid` can be provided.
  169. --
  170. `log`::
  171. +
  172. --
  173. (Optional, <<rank-feature-query-logarithm,function object>>) Logarithmic
  174. function used to boost <<relevance-scores,relevance scores>> based on the
  175. value of the rank feature `field`. See
  176. <<rank-feature-query-logarithm,Logarithm>> for more information.
  177. Only one function `saturation`, `log`, or `sigmoid` can be provided.
  178. --
  179. `sigmoid`::
  180. +
  181. --
  182. (Optional, <<rank-feature-query-sigmoid,function object>>) Sigmoid function used
  183. to boost <<relevance-scores,relevance scores>> based on the value of the
  184. rank feature `field`. See <<rank-feature-query-sigmoid,Sigmoid>> for more
  185. information.
  186. Only one function `saturation`, `log`, or `sigmoid` can be provided.
  187. --
  188. [[rank-feature-query-notes]]
  189. ==== Notes
  190. [[rank-feature-query-saturation]]
  191. ===== Saturation
  192. The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is
  193. the value of the rank feature field and `pivot` is a configurable pivot value so
  194. that the result will be less than `0.5` if `S` is less than pivot and greater
  195. than `0.5` otherwise. Scores are always `(0,1)`.
  196. If the rank feature has a negative score impact then the function will be
  197. computed as `pivot / (S + pivot)`, which decreases when `S` increases.
  198. [source,js]
  199. --------------------------------------------------
  200. GET /test/_search
  201. {
  202. "query": {
  203. "rank_feature": {
  204. "field": "pagerank",
  205. "saturation": {
  206. "pivot": 8
  207. }
  208. }
  209. }
  210. }
  211. --------------------------------------------------
  212. // CONSOLE
  213. If a `pivot` value is not provided, {es} computes a default value equal to the
  214. approximate geometric mean of all rank feature values in the index. We recommend
  215. using this default value if you haven't had the opportunity to train a good
  216. pivot value.
  217. [source,js]
  218. --------------------------------------------------
  219. GET /test/_search
  220. {
  221. "query": {
  222. "rank_feature": {
  223. "field": "pagerank",
  224. "saturation": {}
  225. }
  226. }
  227. }
  228. --------------------------------------------------
  229. // CONSOLE
  230. [[rank-feature-query-logarithm]]
  231. ===== Logarithm
  232. The `log` function gives a score equal to `log(scaling_factor + S)`, where `S`
  233. is the value of the rank feature field and `scaling_factor` is a configurable
  234. scaling factor. Scores are unbounded.
  235. This function only supports rank features that have a positive score impact.
  236. [source,js]
  237. --------------------------------------------------
  238. GET /test/_search
  239. {
  240. "query": {
  241. "rank_feature": {
  242. "field": "pagerank",
  243. "log": {
  244. "scaling_factor": 4
  245. }
  246. }
  247. }
  248. }
  249. --------------------------------------------------
  250. // CONSOLE
  251. [[rank-feature-query-sigmoid]]
  252. ===== Sigmoid
  253. The `sigmoid` function is an extension of `saturation` which adds a configurable
  254. exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
  255. `saturation` function, `pivot` is the value of `S` that gives a score of `0.5`
  256. and scores are `(0,1)`.
  257. The `exponent` must be positive and is typically in `[0.5, 1]`. A
  258. good value should be computed via training. If you don't have the opportunity to
  259. do so, we recommend you use the `saturation` function instead.
  260. [source,js]
  261. --------------------------------------------------
  262. GET /test/_search
  263. {
  264. "query": {
  265. "rank_feature": {
  266. "field": "pagerank",
  267. "sigmoid": {
  268. "pivot": 7,
  269. "exponent": 0.6
  270. }
  271. }
  272. }
  273. }
  274. --------------------------------------------------
  275. // CONSOLE