rank-feature-query.asciidoc 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313
  1. [[query-dsl-rank-feature-query]]
  2. === Rank feature query
  3. ++++
  4. <titleabbrev>Rank feature</titleabbrev>
  5. ++++
  6. Boosts the <<relevance-scores,relevance score>> of documents based on the
  7. numeric value of a <<rank-feature,`rank_feature`>> or
  8. <<rank-features,`rank_features`>> field.
  9. The `rank_feature` query is typically used in the `should` clause of a
  10. <<query-dsl-bool-query,`bool`>> query so its relevance scores are added to other
  11. scores from the `bool` query.
  12. Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
  13. ways to change <<relevance-scores,relevance scores>>, the
  14. `rank_feature` query efficiently skips non-competitive hits when the
  15. <<search-uri-request,`track_total_hits`>> parameter is **not** `true`. This can
  16. dramatically improve query speed.
  17. [[rank-feature-query-functions]]
  18. ==== Rank feature functions
  19. To calculate relevance scores based on rank feature fields, the `rank_feature`
  20. query supports the following mathematical functions:
  21. * <<rank-feature-query-saturation,Saturation>>
  22. * <<rank-feature-query-logarithm,Logarithm>>
  23. * <<rank-feature-query-sigmoid,Sigmoid>>
  24. If you don't know where to start, we recommend using the `saturation` function.
  25. If no function is provided, the `rank_feature` query uses the `saturation`
  26. function by default.
  27. [[rank-feature-query-ex-request]]
  28. ==== Example request
  29. [[rank-feature-query-index-setup]]
  30. ===== Index setup
  31. To use the `rank_feature` query, your index must include a
  32. <<rank-feature,`rank_feature`>> or <<rank-features,`rank_features`>> field
  33. mapping. To see how you can set up an index for the `rank_feature` query, try
  34. the following example.
  35. Create a `test` index with the following field mappings:
  36. - `pagerank`, a <<rank-feature,`rank_feature`>> field which measures the
  37. importance of a website
  38. - `url_length`, a <<rank-feature,`rank_feature`>> field which contains the
  39. length of the website's URL. For this example, a long URL correlates negatively
  40. to relevance, indicated by a `positive_score_impact` value of `false`.
  41. - `topics`, a <<rank-features,`rank_features`>> field which contains a list of
  42. topics and a measure of how well each document is connected to this topic
  43. [source,console]
  44. ----
  45. PUT /test
  46. {
  47. "mappings": {
  48. "properties": {
  49. "pagerank": {
  50. "type": "rank_feature"
  51. },
  52. "url_length": {
  53. "type": "rank_feature",
  54. "positive_score_impact": false
  55. },
  56. "topics": {
  57. "type": "rank_features"
  58. }
  59. }
  60. }
  61. }
  62. ----
  63. // TESTSETUP
  64. Index several documents to the `test` index.
  65. [source,console]
  66. ----
  67. PUT /test/_doc/1?refresh
  68. {
  69. "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
  70. "content": "Rio 2016",
  71. "pagerank": 50.3,
  72. "url_length": 42,
  73. "topics": {
  74. "sports": 50,
  75. "brazil": 30
  76. }
  77. }
  78. PUT /test/_doc/2?refresh
  79. {
  80. "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  81. "content": "Formula One motor race held on 13 November 2016",
  82. "pagerank": 50.3,
  83. "url_length": 47,
  84. "topics": {
  85. "sports": 35,
  86. "formula one": 65,
  87. "brazil": 20
  88. }
  89. }
  90. PUT /test/_doc/3?refresh
  91. {
  92. "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
  93. "content": "Deadpool is a 2016 American superhero film",
  94. "pagerank": 50.3,
  95. "url_length": 37,
  96. "topics": {
  97. "movies": 60,
  98. "super hero": 65
  99. }
  100. }
  101. ----
  102. [[rank-feature-query-ex-query]]
  103. ===== Example query
  104. The following query searches for `2016` and boosts relevance scores based on
  105. `pagerank`, `url_length`, and the `sports` topic.
  106. [source,console]
  107. ----
  108. GET /test/_search
  109. {
  110. "query": {
  111. "bool": {
  112. "must": [
  113. {
  114. "match": {
  115. "content": "2016"
  116. }
  117. }
  118. ],
  119. "should": [
  120. {
  121. "rank_feature": {
  122. "field": "pagerank"
  123. }
  124. },
  125. {
  126. "rank_feature": {
  127. "field": "url_length",
  128. "boost": 0.1
  129. }
  130. },
  131. {
  132. "rank_feature": {
  133. "field": "topics.sports",
  134. "boost": 0.4
  135. }
  136. }
  137. ]
  138. }
  139. }
  140. }
  141. ----
  142. [[rank-feature-top-level-params]]
  143. ==== Top-level parameters for `rank_feature`
  144. `field`::
  145. (Required, string) <<rank-feature,`rank_feature`>> or
  146. <<rank-features,`rank_features`>> field used to boost
  147. <<relevance-scores,relevance scores>>.
  148. `boost`::
  149. +
  150. --
  151. (Optional, float) Floating point number used to decrease or increase
  152. <<relevance-scores,relevance scores>>. Defaults to `1.0`.
  153. Boost values are relative to the default value of `1.0`. A boost value between
  154. `0` and `1.0` decreases the relevance score. A value greater than `1.0`
  155. increases the relevance score.
  156. --
  157. `saturation`::
  158. +
  159. --
  160. (Optional, <<rank-feature-query-saturation,function object>>) Saturation
  161. function used to boost <<relevance-scores,relevance scores>> based on the
  162. value of the rank feature `field`. If no function is provided, the `rank_feature`
  163. query defaults to the `saturation` function. See
  164. <<rank-feature-query-saturation,Saturation>> for more information.
  165. Only one function `saturation`, `log`, or `sigmoid` can be provided.
  166. --
  167. `log`::
  168. +
  169. --
  170. (Optional, <<rank-feature-query-logarithm,function object>>) Logarithmic
  171. function used to boost <<relevance-scores,relevance scores>> based on the
  172. value of the rank feature `field`. See
  173. <<rank-feature-query-logarithm,Logarithm>> for more information.
  174. Only one function `saturation`, `log`, or `sigmoid` can be provided.
  175. --
  176. `sigmoid`::
  177. +
  178. --
  179. (Optional, <<rank-feature-query-sigmoid,function object>>) Sigmoid function used
  180. to boost <<relevance-scores,relevance scores>> based on the value of the
  181. rank feature `field`. See <<rank-feature-query-sigmoid,Sigmoid>> for more
  182. information.
  183. Only one function `saturation`, `log`, or `sigmoid` can be provided.
  184. --
  185. [[rank-feature-query-notes]]
  186. ==== Notes
  187. [[rank-feature-query-saturation]]
  188. ===== Saturation
  189. The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is
  190. the value of the rank feature field and `pivot` is a configurable pivot value so
  191. that the result will be less than `0.5` if `S` is less than pivot and greater
  192. than `0.5` otherwise. Scores are always `(0,1)`.
  193. If the rank feature has a negative score impact then the function will be
  194. computed as `pivot / (S + pivot)`, which decreases when `S` increases.
  195. [source,console]
  196. --------------------------------------------------
  197. GET /test/_search
  198. {
  199. "query": {
  200. "rank_feature": {
  201. "field": "pagerank",
  202. "saturation": {
  203. "pivot": 8
  204. }
  205. }
  206. }
  207. }
  208. --------------------------------------------------
  209. If a `pivot` value is not provided, {es} computes a default value equal to the
  210. approximate geometric mean of all rank feature values in the index. We recommend
  211. using this default value if you haven't had the opportunity to train a good
  212. pivot value.
  213. [source,console]
  214. --------------------------------------------------
  215. GET /test/_search
  216. {
  217. "query": {
  218. "rank_feature": {
  219. "field": "pagerank",
  220. "saturation": {}
  221. }
  222. }
  223. }
  224. --------------------------------------------------
  225. [[rank-feature-query-logarithm]]
  226. ===== Logarithm
  227. The `log` function gives a score equal to `log(scaling_factor + S)`, where `S`
  228. is the value of the rank feature field and `scaling_factor` is a configurable
  229. scaling factor. Scores are unbounded.
  230. This function only supports rank features that have a positive score impact.
  231. [source,console]
  232. --------------------------------------------------
  233. GET /test/_search
  234. {
  235. "query": {
  236. "rank_feature": {
  237. "field": "pagerank",
  238. "log": {
  239. "scaling_factor": 4
  240. }
  241. }
  242. }
  243. }
  244. --------------------------------------------------
  245. [[rank-feature-query-sigmoid]]
  246. ===== Sigmoid
  247. The `sigmoid` function is an extension of `saturation` which adds a configurable
  248. exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
  249. `saturation` function, `pivot` is the value of `S` that gives a score of `0.5`
  250. and scores are `(0,1)`.
  251. The `exponent` must be positive and is typically in `[0.5, 1]`. A
  252. good value should be computed via training. If you don't have the opportunity to
  253. do so, we recommend you use the `saturation` function instead.
  254. [source,console]
  255. --------------------------------------------------
  256. GET /test/_search
  257. {
  258. "query": {
  259. "rank_feature": {
  260. "field": "pagerank",
  261. "sigmoid": {
  262. "pivot": 7,
  263. "exponent": 0.6
  264. }
  265. }
  266. }
  267. }
  268. --------------------------------------------------