rank-feature-query.asciidoc 5.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236
  1. [[query-dsl-rank-feature-query]]
  2. === Rank feature query
  3. ++++
  4. <titleabbrev>Rank feature</titleabbrev>
  5. ++++
  6. The `rank_feature` query is a specialized query that only works on
  7. <<rank-feature,`rank_feature`>> fields and <<rank-features,`rank_features`>> fields.
  8. Its goal is to boost the score of documents based on the values of numeric
  9. features. It is typically put in a `should` clause of a
  10. <<query-dsl-bool-query,`bool`>> query so that its score is added to the score
  11. of the query.
  12. Compared to using <<query-dsl-function-score-query,`function_score`>> or other
  13. ways to modify the score, this query has the benefit of being able to
  14. efficiently skip non-competitive hits when
  15. <<search-uri-request,`track_total_hits`>> is not set to `true`. Speedups may be
  16. spectacular.
  17. Here is an example that indexes various features:
  18. - https://en.wikipedia.org/wiki/PageRank[`pagerank`], a measure of the
  19. importance of a website,
  20. - `url_length`, the length of the url, which typically correlates negatively
  21. with relevance,
  22. - `topics`, which associates a list of topics with every document alongside a
  23. measure of how well the document is connected to this topic.
  24. Then the example includes an example query that searches for `"2016"` and boosts
  25. based or `pagerank`, `url_length` and the `sports` topic.
  26. [source,js]
  27. --------------------------------------------------
  28. PUT test
  29. {
  30. "mappings": {
  31. "properties": {
  32. "pagerank": {
  33. "type": "rank_feature"
  34. },
  35. "url_length": {
  36. "type": "rank_feature",
  37. "positive_score_impact": false
  38. },
  39. "topics": {
  40. "type": "rank_features"
  41. }
  42. }
  43. }
  44. }
  45. PUT test/_doc/1
  46. {
  47. "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
  48. "content": "Rio 2016",
  49. "pagerank": 50.3,
  50. "url_length": 42,
  51. "topics": {
  52. "sports": 50,
  53. "brazil": 30
  54. }
  55. }
  56. PUT test/_doc/2
  57. {
  58. "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
  59. "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil",
  60. "pagerank": 50.3,
  61. "url_length": 47,
  62. "topics": {
  63. "sports": 35,
  64. "formula one": 65,
  65. "brazil": 20
  66. }
  67. }
  68. PUT test/_doc/3
  69. {
  70. "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
  71. "content": "Deadpool is a 2016 American superhero film",
  72. "pagerank": 50.3,
  73. "url_length": 37,
  74. "topics": {
  75. "movies": 60,
  76. "super hero": 65
  77. }
  78. }
  79. POST test/_refresh
  80. GET test/_search
  81. {
  82. "query": {
  83. "bool": {
  84. "must": [
  85. {
  86. "match": {
  87. "content": "2016"
  88. }
  89. }
  90. ],
  91. "should": [
  92. {
  93. "rank_feature": {
  94. "field": "pagerank"
  95. }
  96. },
  97. {
  98. "rank_feature": {
  99. "field": "url_length",
  100. "boost": 0.1
  101. }
  102. },
  103. {
  104. "rank_feature": {
  105. "field": "topics.sports",
  106. "boost": 0.4
  107. }
  108. }
  109. ]
  110. }
  111. }
  112. }
  113. --------------------------------------------------
  114. // CONSOLE
  115. [float]
  116. === Supported functions
  117. The `rank_feature` query supports 3 functions in order to boost scores using the
  118. values of rank features. If you do not know where to start, we recommend that you
  119. start with the `saturation` function, which is the default when no function is
  120. provided.
  121. [float]
  122. ==== Saturation
  123. This function gives a score that is equal to `S / (S + pivot)` where `S` is the
  124. value of the rank feature and `pivot` is a configurable pivot value so that the
  125. result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
  126. otherwise. Scores are always is +(0, 1)+.
  127. If the rank feature has a negative score impact then the function will be computed as
  128. `pivot / (S + pivot)`, which decreases when `S` increases.
  129. [source,js]
  130. --------------------------------------------------
  131. GET test/_search
  132. {
  133. "query": {
  134. "rank_feature": {
  135. "field": "pagerank",
  136. "saturation": {
  137. "pivot": 8
  138. }
  139. }
  140. }
  141. }
  142. --------------------------------------------------
  143. // CONSOLE
  144. // TEST[continued]
  145. If +pivot+ is not supplied then Elasticsearch will compute a default value that
  146. will be approximately equal to the geometric mean of all feature values that
  147. exist in the index. We recommend this if you haven't had the opportunity to
  148. train a good pivot value.
  149. [source,js]
  150. --------------------------------------------------
  151. GET test/_search
  152. {
  153. "query": {
  154. "rank_feature": {
  155. "field": "pagerank",
  156. "saturation": {}
  157. }
  158. }
  159. }
  160. --------------------------------------------------
  161. // CONSOLE
  162. // TEST[continued]
  163. [float]
  164. ==== Logarithm
  165. This function gives a score that is equal to `log(scaling_factor + S)` where
  166. `S` is the value of the rank feature and `scaling_factor` is a configurable scaling
  167. factor. Scores are unbounded.
  168. This function only supports rank features that have a positive score impact.
  169. [source,js]
  170. --------------------------------------------------
  171. GET test/_search
  172. {
  173. "query": {
  174. "rank_feature": {
  175. "field": "pagerank",
  176. "log": {
  177. "scaling_factor": 4
  178. }
  179. }
  180. }
  181. }
  182. --------------------------------------------------
  183. // CONSOLE
  184. // TEST[continued]
  185. [float]
  186. ==== Sigmoid
  187. This function is an extension of `saturation` which adds a configurable
  188. exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
  189. `saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
  190. and scores are in +(0, 1)+.
  191. `exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
  192. be computed via training. If you don't have the opportunity to do so, we recommend
  193. that you stick to the `saturation` function instead.
  194. [source,js]
  195. --------------------------------------------------
  196. GET test/_search
  197. {
  198. "query": {
  199. "rank_feature": {
  200. "field": "pagerank",
  201. "sigmoid": {
  202. "pivot": 7,
  203. "exponent": 0.6
  204. }
  205. }
  206. }
  207. }
  208. --------------------------------------------------
  209. // CONSOLE
  210. // TEST[continued]