rank-feature-query.asciidoc 9.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349
  1. [[query-dsl-rank-feature-query]]
  2. === Rank feature query
  3. ++++
  4. <titleabbrev>Rank feature</titleabbrev>
  5. ++++
  6. Boosts the <<relevance-scores,relevance score>> of documents based on the
  7. numeric value of a <<rank-feature,`rank_feature`>> or
  8. <<rank-features,`rank_features`>> field.
  9. The `rank_feature` query is typically used in the `should` clause of a
  10. <<query-dsl-bool-query,`bool`>> query so its relevance scores are added to other
  11. scores from the `bool` query.
  12. Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
  13. ways to change <<relevance-scores,relevance scores>>, the
  14. `rank_feature` query efficiently skips non-competitive hits when the
  15. <<search-uri-request,`track_total_hits`>> parameter is **not** `true`. This can
  16. dramatically improve query speed.
  17. [[rank-feature-query-functions]]
  18. ==== Rank feature functions
  19. To calculate relevance scores based on rank feature fields, the `rank_feature`
  20. query supports the following mathematical functions:
  21. * <<rank-feature-query-saturation,Saturation>>
  22. * <<rank-feature-query-logarithm,Logarithm>>
  23. * <<rank-feature-query-sigmoid,Sigmoid>>
  24. * <<rank-feature-query-linear,Linear>>
  25. If you don't know where to start, we recommend using the `saturation` function.
  26. If no function is provided, the `rank_feature` query uses the `saturation`
  27. function by default.
  28. [[rank-feature-query-ex-request]]
  29. ==== Example request
  30. [[rank-feature-query-index-setup]]
  31. ===== Index setup
  32. To use the `rank_feature` query, your index must include a
  33. <<rank-feature,`rank_feature`>> or <<rank-features,`rank_features`>> field
  34. mapping. To see how you can set up an index for the `rank_feature` query, try
  35. the following example.
  36. Create a `test` index with the following field mappings:
  37. - `pagerank`, a <<rank-feature,`rank_feature`>> field which measures the
  38. importance of a website
  39. - `url_length`, a <<rank-feature,`rank_feature`>> field which contains the
  40. length of the website's URL. For this example, a long URL correlates negatively
  41. to relevance, indicated by a `positive_score_impact` value of `false`.
  42. - `topics`, a <<rank-features,`rank_features`>> field which contains a list of
  43. topics and a measure of how well each document is connected to this topic
  44. [source,console]
  45. ----
  46. PUT /test
  47. {
  48. "mappings": {
  49. "properties": {
  50. "pagerank": {
  51. "type": "rank_feature"
  52. },
  53. "url_length": {
  54. "type": "rank_feature",
  55. "positive_score_impact": false
  56. },
  57. "topics": {
  58. "type": "rank_features"
  59. }
  60. }
  61. }
  62. }
  63. ----
  64. // TESTSETUP
  65. Index several documents to the `test` index.
  66. [source,console]
  67. ----
  68. PUT /test/_doc/1?refresh
  69. {
  70. "url": "https://en.wikipedia.org/wiki/2016_Summer_Olympics",
  71. "content": "Rio 2016",
  72. "pagerank": 50.3,
  73. "url_length": 42,
  74. "topics": {
  75. "sports": 50,
  76. "brazil": 30
  77. }
  78. }
  79. PUT /test/_doc/2?refresh
  80. {
  81. "url": "https://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  82. "content": "Formula One motor race held on 13 November 2016",
  83. "pagerank": 50.3,
  84. "url_length": 47,
  85. "topics": {
  86. "sports": 35,
  87. "formula one": 65,
  88. "brazil": 20
  89. }
  90. }
  91. PUT /test/_doc/3?refresh
  92. {
  93. "url": "https://en.wikipedia.org/wiki/Deadpool_(film)",
  94. "content": "Deadpool is a 2016 American superhero film",
  95. "pagerank": 50.3,
  96. "url_length": 37,
  97. "topics": {
  98. "movies": 60,
  99. "super hero": 65
  100. }
  101. }
  102. ----
  103. [[rank-feature-query-ex-query]]
  104. ===== Example query
  105. The following query searches for `2016` and boosts relevance scores based on
  106. `pagerank`, `url_length`, and the `sports` topic.
  107. [source,console]
  108. ----
  109. GET /test/_search
  110. {
  111. "query": {
  112. "bool": {
  113. "must": [
  114. {
  115. "match": {
  116. "content": "2016"
  117. }
  118. }
  119. ],
  120. "should": [
  121. {
  122. "rank_feature": {
  123. "field": "pagerank"
  124. }
  125. },
  126. {
  127. "rank_feature": {
  128. "field": "url_length",
  129. "boost": 0.1
  130. }
  131. },
  132. {
  133. "rank_feature": {
  134. "field": "topics.sports",
  135. "boost": 0.4
  136. }
  137. }
  138. ]
  139. }
  140. }
  141. }
  142. ----
  143. [[rank-feature-top-level-params]]
  144. ==== Top-level parameters for `rank_feature`
  145. `field`::
  146. (Required, string) <<rank-feature,`rank_feature`>> or
  147. <<rank-features,`rank_features`>> field used to boost
  148. <<relevance-scores,relevance scores>>.
  149. `boost`::
  150. +
  151. --
  152. (Optional, float) Floating point number used to decrease or increase
  153. <<relevance-scores,relevance scores>>. Defaults to `1.0`.
  154. Boost values are relative to the default value of `1.0`. A boost value between
  155. `0` and `1.0` decreases the relevance score. A value greater than `1.0`
  156. increases the relevance score.
  157. --
  158. `saturation`::
  159. +
  160. --
  161. (Optional, <<rank-feature-query-saturation,function object>>) Saturation
  162. function used to boost <<relevance-scores,relevance scores>> based on the
  163. value of the rank feature `field`. If no function is provided, the `rank_feature`
  164. query defaults to the `saturation` function. See
  165. <<rank-feature-query-saturation,Saturation>> for more information.
  166. Only one function `saturation`, `log`, `sigmoid` or `linear` can be provided.
  167. --
  168. `log`::
  169. +
  170. --
  171. (Optional, <<rank-feature-query-logarithm,function object>>) Logarithmic
  172. function used to boost <<relevance-scores,relevance scores>> based on the
  173. value of the rank feature `field`. See
  174. <<rank-feature-query-logarithm,Logarithm>> for more information.
  175. Only one function `saturation`, `log`, `sigmoid` or `linear` can be provided.
  176. --
  177. `sigmoid`::
  178. +
  179. --
  180. (Optional, <<rank-feature-query-sigmoid,function object>>) Sigmoid function used
  181. to boost <<relevance-scores,relevance scores>> based on the value of the
  182. rank feature `field`. See <<rank-feature-query-sigmoid,Sigmoid>> for more
  183. information.
  184. Only one function `saturation`, `log`, `sigmoid` or `linear` can be provided.
  185. --
  186. `linear`::
  187. +
  188. --
  189. (Optional, <<rank-feature-query-linear,function object>>) Linear function used
  190. to boost <<relevance-scores,relevance scores>> based on the value of the
  191. rank feature `field`. See <<rank-feature-query-linear,Linear>> for more
  192. information.
  193. Only one function `saturation`, `log`, `sigmoid` or `linear` can be provided.
  194. --
  195. [[rank-feature-query-notes]]
  196. ==== Notes
  197. [[rank-feature-query-saturation]]
  198. ===== Saturation
  199. The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is
  200. the value of the rank feature field and `pivot` is a configurable pivot value so
  201. that the result will be less than `0.5` if `S` is less than pivot and greater
  202. than `0.5` otherwise. Scores are always `(0,1)`.
  203. If the rank feature has a negative score impact then the function will be
  204. computed as `pivot / (S + pivot)`, which decreases when `S` increases.
  205. [source,console]
  206. --------------------------------------------------
  207. GET /test/_search
  208. {
  209. "query": {
  210. "rank_feature": {
  211. "field": "pagerank",
  212. "saturation": {
  213. "pivot": 8
  214. }
  215. }
  216. }
  217. }
  218. --------------------------------------------------
  219. If a `pivot` value is not provided, {es} computes a default value equal to the
  220. approximate geometric mean of all rank feature values in the index. We recommend
  221. using this default value if you haven't had the opportunity to train a good
  222. pivot value.
  223. [source,console]
  224. --------------------------------------------------
  225. GET /test/_search
  226. {
  227. "query": {
  228. "rank_feature": {
  229. "field": "pagerank",
  230. "saturation": {}
  231. }
  232. }
  233. }
  234. --------------------------------------------------
  235. [[rank-feature-query-logarithm]]
  236. ===== Logarithm
  237. The `log` function gives a score equal to `log(scaling_factor + S)`, where `S`
  238. is the value of the rank feature field and `scaling_factor` is a configurable
  239. scaling factor. Scores are unbounded.
  240. This function only supports rank features that have a positive score impact.
  241. [source,console]
  242. --------------------------------------------------
  243. GET /test/_search
  244. {
  245. "query": {
  246. "rank_feature": {
  247. "field": "pagerank",
  248. "log": {
  249. "scaling_factor": 4
  250. }
  251. }
  252. }
  253. }
  254. --------------------------------------------------
  255. [[rank-feature-query-sigmoid]]
  256. ===== Sigmoid
  257. The `sigmoid` function is an extension of `saturation` which adds a configurable
  258. exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
  259. `saturation` function, `pivot` is the value of `S` that gives a score of `0.5`
  260. and scores are `(0,1)`.
  261. The `exponent` must be positive and is typically in `[0.5, 1]`. A
  262. good value should be computed via training. If you don't have the opportunity to
  263. do so, we recommend you use the `saturation` function instead.
  264. [source,console]
  265. --------------------------------------------------
  266. GET /test/_search
  267. {
  268. "query": {
  269. "rank_feature": {
  270. "field": "pagerank",
  271. "sigmoid": {
  272. "pivot": 7,
  273. "exponent": 0.6
  274. }
  275. }
  276. }
  277. }
  278. --------------------------------------------------
  279. [[rank-feature-query-linear]]
  280. ===== Linear
  281. The `linear` function is the simplest function, and gives a score equal
  282. to the indexed value of `S`, where `S` is the value of the rank feature
  283. field.
  284. If a rank feature field is indexed with `"positive_score_impact": true`,
  285. its indexed value is equal to `S` and rounded to preserve only
  286. 9 significant bits for the precision.
  287. If a rank feature field is indexed with `"positive_score_impact": false`,
  288. its indexed value is equal to `1/S` and rounded to preserve only 9 significant
  289. bits for the precision.
  290. [source,console]
  291. --------------------------------------------------
  292. GET /test/_search
  293. {
  294. "query": {
  295. "rank_feature": {
  296. "field": "pagerank",
  297. "linear": {}
  298. }
  299. }
  300. }
  301. --------------------------------------------------