Browse Source

[DOCS] Reformat rank feature query. Add relevance score section. (#44975)

James Rodewig 6 years ago
parent
commit
d9b2d8d9cf

+ 32 - 19
docs/reference/query-dsl/query_filter_context.asciidoc

@@ -1,27 +1,38 @@
 [[query-filter-context]]
 == Query and filter context
 
-The behaviour of a query clause depends on whether it is used in _query context_ or
-in _filter context_:
+[float]
+[[relevance-scores]]
+=== Relevance scores
 
-Query context::
-+
---
-A query clause used in query context answers the question ``__How well does this
+By default, Elasticsearch sorts matching search results by **relevance
+score**, which measures how well each document matches a query.
+
+The relevance score is a positive floating point number, returned in the
+`_score` meta-field of the <<search-request-body,search>> API. The higher the
+`_score`, the more relevant the document. While each query type can calculate
+relevance scores differently, score calculation also depends on whether the
+query clause is run in a **query** or **filter** context.
+
+[float]
+[[query-context]]
+=== Query context
+In the query context, a query clause answers the question ``__How well does this
 document match this query clause?__'' Besides deciding whether or not the
-document matches, the query clause also calculates a `_score` representing how
-well the document matches, relative to other documents.
+document matches, the query clause also calculates a relevance score in the
+`_score` meta-field.
 
-Query context is in effect whenever a query clause is passed to a `query` parameter,
-such as the `query` parameter in the <<request-body-search-query,`search`>> API.
---
+Query context is in effect whenever a query clause is passed to a `query`
+parameter, such as the `query` parameter in the
+<<request-body-search-query,search>> API.
 
-Filter context::
-+
---
-In _filter_ context, a query clause answers the question ``__Does this document
-match this query clause?__''  The answer is a simple Yes or No -- no scores are
-calculated.  Filter context is mostly used for filtering structured data, e.g.
+[float]
+[[filter-context]]
+=== Filter context
+In a filter context, a query clause answers the question ``__Does this
+document match this query clause?__''  The answer is a simple Yes or No -- no
+scores are calculated.  Filter context is mostly used for filtering structured
+data, e.g.
 
 *  __Does this +timestamp+ fall into the range 2015 to 2016?__
 *  __Is the +status+  field set to ++"published"++__?
@@ -34,8 +45,10 @@ parameter, such as the `filter` or `must_not` parameters in the
 <<query-dsl-bool-query,`bool`>> query, the `filter` parameter in the
 <<query-dsl-constant-score-query,`constant_score`>> query, or the
 <<search-aggregations-bucket-filter-aggregation,`filter`>> aggregation.
---
 
+[float]
+[[query-filter-context-ex]]
+=== Example of query and filter contexts
 Below is an example of query clauses being used in query and filter context
 in the `search` API.  This query will match documents where all of the following
 conditions are met:
@@ -80,4 +93,4 @@ significand's precision will be converted to floats with loss of precision.
 
 TIP: Use query clauses in query context for conditions which should affect the
 score of matching documents (i.e. how well does the document match), and use
-all other query clauses in filter context.
+all other query clauses in filter context.

+ 158 - 74
docs/reference/query-dsl/rank-feature-query.asciidoc

@@ -4,33 +4,58 @@
 <titleabbrev>Rank feature</titleabbrev>
 ++++
 
-The `rank_feature` query is a specialized query that only works on
-<<rank-feature,`rank_feature`>> fields and <<rank-features,`rank_features`>> fields.
-Its goal is to boost the score of documents based on the values of numeric
-features. It is typically put in a `should` clause of a
-<<query-dsl-bool-query,`bool`>> query so that its score is added to the score
-of the query.
-
-Compared to using <<query-dsl-function-score-query,`function_score`>> or other
-ways to modify the score, this query has the benefit of being able to
-efficiently skip non-competitive hits when
-<<search-uri-request,`track_total_hits`>> is not set to `true`. Speedups may be
-spectacular.
-
-Here is an example that indexes various features:
- - https://en.wikipedia.org/wiki/PageRank[`pagerank`], a measure of the
-   importance of a website,
- - `url_length`, the length of the url, which typically correlates negatively
-   with relevance,
- - `topics`, which associates a list of topics with every document alongside a
-   measure of how well the document is connected to this topic.
-
-Then the example includes an example query that searches for `"2016"` and boosts
-based or `pagerank`, `url_length` and the `sports` topic.
+Boosts the <<relevance-scores,relevance score>> of documents based on the
+numeric value of a <<rank-feature,`rank_feature`>> or
+<<rank-features,`rank_features`>> field.
+
+The `rank_feature` query is typically used in the `should` clause of a
+<<query-dsl-bool-query,`bool`>> query so its relevance scores are added to other
+scores from the `bool` query.
+
+Unlike the <<query-dsl-function-score-query,`function_score`>> query or other
+ways to change <<relevance-scores,relevance scores>>, the
+`rank_feature` query efficiently skips non-competitive hits when the
+<<search-uri-request,`track_total_hits`>> parameter is **not** `true`. This can
+dramatically improve query speed.
+
+[[rank-feature-query-functions]]
+==== Rank feature functions
+
+To calculate relevance scores based on rank feature fields, the `rank_feature`
+query supports the following mathematical functions:
+
+* <<rank-feature-query-saturation,Saturation>>
+* <<rank-feature-query-logarithm,Logarithm>>
+* <<rank-feature-query-sigmoid,Sigmoid>>
+
+If you don't know where to start, we recommend using the `saturation` function.
+If no function is provided, the `rank_feature` query uses the `saturation`
+function by default.
+
+[[rank-feature-query-ex-request]]
+==== Example request
+
+[[rank-feature-query-index-setup]]
+===== Index setup
+
+To use the `rank_feature` query, your index must include a
+<<rank-feature,`rank_feature`>> or <<rank-features,`rank_features`>> field
+mapping. To see how you can set up an index for the `rank_feature` query, try
+the following example.
+
+Create a `test` index with the following field mappings:
+
+- `pagerank`, a <<rank-feature,`rank_feature`>> field which measures the
+importance of a website
+- `url_length`, a <<rank-feature,`rank_feature`>> field which contains the
+length of the website's URL. For this example, a long URL correlates negatively
+to relevance, indicated by a `positive_score_impact` value of `false`.
+- `topics`, a <<rank-features,`rank_features`>> field which contains a list of
+topics and a measure of how well each document is connected to this topic
 
 [source,js]
---------------------------------------------------
-PUT test
+----
+PUT /test
 {
   "mappings": {
     "properties": {
@@ -47,8 +72,16 @@ PUT test
     }
   }
 }
+----
+// CONSOLE
+// TESTSETUP
 
-PUT test/_doc/1
+
+Index several documents to the `test` index.
+
+[source,js]
+----
+PUT /test/_doc/1?refresh
 {
   "url": "http://en.wikipedia.org/wiki/2016_Summer_Olympics",
   "content": "Rio 2016",
@@ -60,10 +93,10 @@ PUT test/_doc/1
   }
 }
 
-PUT test/_doc/2
+PUT /test/_doc/2?refresh
 {
   "url": "http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix",
-  "content": "Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil",
+  "content": "Formula One motor race held on 13 November 2016",
   "pagerank": 50.3,
   "url_length": 47,
   "topics": {
@@ -73,7 +106,7 @@ PUT test/_doc/2
   }
 }
 
-PUT test/_doc/3
+PUT /test/_doc/3?refresh
 {
   "url": "http://en.wikipedia.org/wiki/Deadpool_(film)",
   "content": "Deadpool is a 2016 American superhero film",
@@ -84,10 +117,18 @@ PUT test/_doc/3
     "super hero": 65
   }
 }
+----
+// CONSOLE
 
-POST test/_refresh
+[[rank-feature-query-ex-query]]
+===== Example query
 
-GET test/_search 
+The following query searches for `2016` and boosts relevance scores based or
+`pagerank`, `url_length`, and the `sports` topic.
+
+[source,js]
+----
+GET /test/_search 
 {
   "query": {
     "bool": {
@@ -120,31 +161,80 @@ GET test/_search
     }
   }
 }
---------------------------------------------------
+----
 // CONSOLE
 
-[float]
-=== Supported functions
 
-The `rank_feature` query supports 3 functions in order to boost scores using the
-values of rank features. If you do not know where to start, we recommend that you
-start with the `saturation` function, which is the default when no function is
-provided.
+[[rank-feature-top-level-params]]
+==== Top-level parameters for `rank_feature`
+
+`field`::
+(Required, string) <<rank-feature,`rank_feature`>> or
+<<rank-features,`rank_features`>> field used to boost
+<<relevance-scores,relevance scores>>.
+
+`boost`::
++
+--
+(Optional, float) Floating point number used to decrease or increase
+<<relevance-scores,relevance scores>>. Defaults to `1.0`.
+
+Boost values are relative to the default value of `1.0`. A boost value between
+`0` and `1.0` decreases the relevance score. A value greater than `1.0`
+increases the relevance score.
+--
+
+`saturation`::
++
+--
+(Optional, <<rank-feature-query-saturation,function object>>) Saturation
+function used to boost <<relevance-scores,relevance scores>> based on the
+value of the rank feature `field`. If no function is provided, the `rank_feature`
+query defaults to the `saturation` function. See
+<<rank-feature-query-saturation,Saturation>> for more information.
+
+Only one function `saturation`, `log`, or `sigmoid` can be provided.
+--
 
-[float]
-==== Saturation
+`log`::
++
+--
+(Optional, <<rank-feature-query-logarithm,function object>>) Logarithmic
+function used to boost <<relevance-scores,relevance scores>> based on the
+value of the rank feature `field`. See
+<<rank-feature-query-logarithm,Logarithm>> for more information.
 
-This function gives a score that is equal to `S / (S + pivot)` where `S` is the
-value of the rank feature and `pivot` is a configurable pivot value so that the
-result will be less than +0.5+ if `S` is less than pivot and greater than +0.5+
-otherwise. Scores are always is +(0, 1)+.
+Only one function `saturation`, `log`, or `sigmoid` can be provided.
+--
 
-If the rank feature has a negative score impact then the function will be computed as
-`pivot / (S + pivot)`, which decreases when `S` increases.
+`sigmoid`::
++
+--
+(Optional, <<rank-feature-query-sigmoid,function object>>) Sigmoid function used
+to boost <<relevance-scores,relevance scores>> based on the value of the
+rank feature `field`. See <<rank-feature-query-sigmoid,Sigmoid>> for more
+information.
+
+Only one function `saturation`, `log`, or `sigmoid` can be provided.
+--
+
+
+[[rank-feature-query-notes]]
+==== Notes
+
+[[rank-feature-query-saturation]]
+===== Saturation
+The `saturation` function gives a score equal to `S / (S + pivot)`, where `S` is
+the value of the rank feature field and `pivot` is a configurable pivot value so
+that the result will be less than `0.5` if `S` is less than pivot and greater
+than `0.5` otherwise. Scores are always `(0,1)`.
+
+If the rank feature has a negative score impact then the function will be
+computed as `pivot / (S + pivot)`, which decreases when `S` increases.
 
 [source,js]
 --------------------------------------------------
-GET test/_search
+GET /test/_search
 {
   "query": {
     "rank_feature": {
@@ -157,16 +247,15 @@ GET test/_search
 }
 --------------------------------------------------
 // CONSOLE
-// TEST[continued]
 
-If +pivot+ is not supplied then Elasticsearch will compute a default value that
-will be approximately equal to the geometric mean of all feature values that
-exist in the index. We recommend this if you haven't had the opportunity to
-train a good pivot value.
+If a `pivot` value is not provided, {es} computes a default value equal to the
+approximate geometric mean of all rank feature values in the index. We recommend
+using this default value if you haven't had the opportunity to train a good
+pivot value.
 
 [source,js]
 --------------------------------------------------
-GET test/_search
+GET /test/_search
 {
   "query": {
     "rank_feature": {
@@ -177,20 +266,18 @@ GET test/_search
 }
 --------------------------------------------------
 // CONSOLE
-// TEST[continued]
-
-[float]
-==== Logarithm
 
-This function gives a score that is equal to `log(scaling_factor + S)` where
-`S` is the value of the rank feature and `scaling_factor` is a configurable scaling
-factor. Scores are unbounded.
+[[rank-feature-query-logarithm]]
+===== Logarithm
+The `log` function gives a score equal to `log(scaling_factor + S)`, where `S`
+is the value of the rank feature field and `scaling_factor` is a configurable
+scaling factor. Scores are unbounded.
 
 This function only supports rank features that have a positive score impact.
 
 [source,js]
 --------------------------------------------------
-GET test/_search
+GET /test/_search
 {
   "query": {
     "rank_feature": {
@@ -203,23 +290,21 @@ GET test/_search
 }
 --------------------------------------------------
 // CONSOLE
-// TEST[continued]
 
-[float]
-==== Sigmoid
-
-This function is an extension of `saturation` which adds a configurable
+[[rank-feature-query-sigmoid]]
+===== Sigmoid
+The `sigmoid` function is an extension of `saturation` which adds a configurable
 exponent. Scores are computed as `S^exp^ / (S^exp^ + pivot^exp^)`. Like for the
-`saturation` function, `pivot` is the value of `S` that gives a score of +0.5+
-and scores are in +(0, 1)+.
+`saturation` function, `pivot` is the value of `S` that gives a score of `0.5`
+and scores are `(0,1)`.
 
-`exponent` must be positive, but is typically in +[0.5, 1]+. A good value should
-be computed via training. If you don't have the opportunity to do so, we recommend
-that you stick to the `saturation` function instead.
+The `exponent` must be positive and is typically in `[0.5, 1]`. A
+good value should be computed via training. If you don't have the opportunity to
+do so, we recommend you use the `saturation` function instead.
 
 [source,js]
 --------------------------------------------------
-GET test/_search
+GET /test/_search
 {
   "query": {
     "rank_feature": {
@@ -232,5 +317,4 @@ GET test/_search
   }
 }
 --------------------------------------------------
-// CONSOLE
-// TEST[continued]
+// CONSOLE