123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313 |
- [[query-dsl-rank-feature-query]]
- === Rank feature query
- ++++
- <titleabbrev>Rank feature</titleabbrev>
- ++++
- Boosts the <<relevance-scores,relevance score>> of documents based on the
- numeric value of a <<rank-feature,`rank_feature`>> or
- <<rank-features,`rank_features`>> field.
- The `rank_feature` query is typically used in the `should` clause of a
- <<query-dsl-bool-query,`bool`>> query so its relevance scores are added to other
- scores from the `bool` query.
- Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
- ways to change <<relevance-scores,relevance scores>>, the
- `rank_feature` query efficiently skips non-competitive hits when the
- <<search-uri-request,`track_total_hits`>> parameter is **not** `true`. This can
- dramatically improve query speed.
- [[rank-feature-query-functions]]
- ==== Rank feature functions
- To calculate relevance scores based on rank feature fields, the `rank_feature`
- query supports the following mathematical functions:
- * <<rank-feature-query-saturation,Saturation>>
- * <<rank-feature-query-logarithm,Logarithm>>
- * <<rank-feature-query-sigmoid,Sigmoid>>
- If you don't know where to start, we recommend using the `saturation` function.
- If no function is provided, the `rank_feature` query uses the `saturation`
- function by default.
- [[rank-feature-query-ex-request]]
- ==== Example request
- [[rank-feature-query-index-setup]]
- ===== Index setup
- To use the `rank_feature` query, your index must include a
- <<rank-feature,`rank_feature`>> or <<rank-features,`rank_features`>> field
- mapping. To see how you can set up an index for the `rank_feature` query, try
- the following example.
- Create a `test` index with the following field mappings:
- - `pagerank`, a <<rank-feature,`rank_feature`>> field which measures the
- importance of a website
- - `url_length`, a <<rank-feature,`rank_feature`>> field which contains the
- length of the website's URL. For this example, a long URL correlates negatively
- to relevance, indicated by a `positive_score_impact` value of `false`.
- - `topics`, a <<rank-features,`rank_features`>> field which contains a list of
- topics and a measure of how well each document is connected to this topic
- [source,console]
- ----
- PUT /test
- {
- "mappings": {
- "properties": {
- "pagerank": {
- "type": "rank_feature"
- },
- "url_length": {
- "type": "rank_feature",
- "positive_score_impact": false
- },
- "topics": {
- "type": "rank_features"
- }
- }
- }
- }
- ----
- // TESTSETUP
- Index several documents to the `test` index.
- [source,console]
- ----
- PUT /test/_doc/1?refresh
- {
- "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
- "content": "Rio 2016",
- "pagerank": 50.3,
- "url_length": 42,
- "topics": {
- "sports": 50,
- "brazil": 30
- }
- }
- PUT /test/_doc/2?refresh
- {
- "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
- "content": "Formula One motor race held on 13 November 2016",
- "pagerank": 50.3,
- "url_length": 47,
- "topics": {
- "sports": 35,
- "formula one": 65,
- "brazil": 20
- }
- }
- PUT /test/_doc/3?refresh
- {
- "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
- "content": "Deadpool is a 2016 American superhero film",
- "pagerank": 50.3,
- "url_length": 37,
- "topics": {
- "movies": 60,
- "super hero": 65
- }
- }
- ----
- [[rank-feature-query-ex-query]]
- ===== Example query
- The following query searches for `2016` and boosts relevance scores based or
- `pagerank`, `url_length`, and the `sports` topic.
- [source,console]
- ----
- GET /test/_search
- {
- "query": {
- "bool": {
- "must": [
- {
- "match": {
- "content": "2016"
- }
- }
- ],
- "should": [
- {
- "rank_feature": {
- "field": "pagerank"
- }
- },
- {
- "rank_feature": {
- "field": "url_length",
- "boost": 0.1
- }
- },
- {
- "rank_feature": {
- "field": "topics.sports",
- "boost": 0.4
- }
- }
- ]
- }
- }
- }
- ----
- [[rank-feature-top-level-params]]
- ==== Top-level parameters for `rank_feature`
- `field`::
- (Required, string) <<rank-feature,`rank_feature`>> or
- <<rank-features,`rank_features`>> field used to boost
- <<relevance-scores,relevance scores>>.
- `boost`::
- +
- --
- (Optional, float) Floating point number used to decrease or increase
- <<relevance-scores,relevance scores>>. Defaults to `1.0`.
- Boost values are relative to the default value of `1.0`. A boost value between
- `0` and `1.0` decreases the relevance score. A value greater than `1.0`
- increases the relevance score.
- --
- `saturation`::
- +
- --
- (Optional, <<rank-feature-query-saturation,function object>>) Saturation
- function used to boost <<relevance-scores,relevance scores>> based on the
- value of the rank feature `field`. If no function is provided, the `rank_feature`
- query defaults to the `saturation` function. See
- <<rank-feature-query-saturation,Saturation>> for more information.
- Only one function `saturation`, `log`, or `sigmoid` can be provided.
- --
- `log`::
- +
- --
- (Optional, <<rank-feature-query-logarithm,function object>>) Logarithmic
- function used to boost <<relevance-scores,relevance scores>> based on the
- value of the rank feature `field`. See
- <<rank-feature-query-logarithm,Logarithm>> for more information.
- Only one function `saturation`, `log`, or `sigmoid` can be provided.
- --
- `sigmoid`::
- +
- --
- (Optional, <<rank-feature-query-sigmoid,function object>>) Sigmoid function used
- to boost <<relevance-scores,relevance scores>> based on the value of the
- rank feature `field`. See <<rank-feature-query-sigmoid,Sigmoid>> for more
- information.
- Only one function `saturation`, `log`, or `sigmoid` can be provided.
- --
- [[rank-feature-query-notes]]
- ==== Notes
- [[rank-feature-query-saturation]]
- ===== Saturation
- The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is
- the value of the rank feature field and `pivot` is a configurable pivot value so
- that the result will be less than `0.5` if `S` is less than pivot and greater
- than `0.5` otherwise. Scores are always `(0,1)`.
- If the rank feature has a negative score impact then the function will be
- computed as `pivot / (S + pivot)`, which decreases when `S` increases.
- [source,console]
- --------------------------------------------------
- GET /test/_search
- {
- "query": {
- "rank_feature": {
- "field": "pagerank",
- "saturation": {
- "pivot": 8
- }
- }
- }
- }
- --------------------------------------------------
- If a `pivot` value is not provided, {es} computes a default value equal to the
- approximate geometric mean of all rank feature values in the index. We recommend
- using this default value if you haven't had the opportunity to train a good
- pivot value.
- [source,console]
- --------------------------------------------------
- GET /test/_search
- {
- "query": {
- "rank_feature": {
- "field": "pagerank",
- "saturation": {}
- }
- }
- }
- --------------------------------------------------
- [[rank-feature-query-logarithm]]
- ===== Logarithm
- The `log` function gives a score equal to `log(scaling_factor + S)`, where `S`
- is the value of the rank feature field and `scaling_factor` is a configurable
- scaling factor. Scores are unbounded.
- This function only supports rank features that have a positive score impact.
- [source,console]
- --------------------------------------------------
- GET /test/_search
- {
- "query": {
- "rank_feature": {
- "field": "pagerank",
- "log": {
- "scaling_factor": 4
- }
- }
- }
- }
- --------------------------------------------------
- [[rank-feature-query-sigmoid]]
- ===== Sigmoid
- The `sigmoid` function is an extension of `saturation` which adds a configurable
- exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
- `saturation` function, `pivot` is the value of `S` that gives a score of `0.5`
- and scores are `(0,1)`.
- The `exponent` must be positive and is typically in `[0.5, 1]`. A
- good value should be computed via training. If you don't have the opportunity to
- do so, we recommend you use the `saturation` function instead.
- [source,console]
- --------------------------------------------------
- GET /test/_search
- {
- "query": {
- "rank_feature": {
- "field": "pagerank",
- "sigmoid": {
- "pivot": 7,
- "exponent": 0.6
- }
- }
- }
- }
- --------------------------------------------------
|