浏览代码

Add documentation about script rescorer (#133359)

Mayya Sharipova 1 月之前
父节点
当前提交
d8257b41a5

+ 13 - 1
docs/redirects.yml

@@ -85,4 +85,16 @@ redirects:
       - to: 'reference/query-languages/esql/commands/stats-by.md'
         anchors: {'esql-stats-by'}
       - to: 'reference/query-languages/esql/commands/where.md'
-        anchors: {'esql-where'}
+        anchors: {'esql-where'}
+
+  # https://github.com/elastic/elasticsearch/pull/133359
+  'reference/elasticsearch/rest-apis/filter-search-results.md':
+    to: 'reference/elasticsearch/rest-apis/filter-search-results.md'
+    anchors: {} # pass-through unlisted anchors in the `many` ruleset
+    many:
+      - to: 'reference/elasticsearch/rest-apis/rescore-search-results.md'
+        anchors: {'rescore'}
+      - to: 'reference/elasticsearch/rest-apis/rescore-search-results.md'
+        anchors: {'query-rescorer'}
+      - to: 'reference/elasticsearch/rest-apis/rescore-search-results.md'
+        anchors: {'multiple-rescores'}

+ 1 - 1
docs/reference/aggregations/search-aggregations-metrics-top-hits-aggregation.md

@@ -43,7 +43,7 @@ If you **only** need `docvalue_fields`, `size`, and `sort` then [Top metrics](/r
 ::::
 
 
-`top_hits` does not support the [`rescore`](/reference/elasticsearch/rest-apis/filter-search-results.md#rescore) parameter. Query rescoring applies only to search hits, not aggregation results. To change the scores used by aggregations, use a [`function_score`](/reference/query-languages/query-dsl/query-dsl-function-score-query.md) or [`script_score`](/reference/query-languages/query-dsl/query-dsl-script-score-query.md) query.
+`top_hits` does not support the [`rescore`](/reference/elasticsearch/rest-apis/rescore-search-results.md#rescore) parameter. Query rescoring applies only to search hits, not aggregation results. To change the scores used by aggregations, use a [`function_score`](/reference/query-languages/query-dsl/query-dsl-function-score-query.md) or [`script_score`](/reference/query-languages/query-dsl/query-dsl-script-score-query.md) query.
 
 
 ## Example [_example_6]

+ 2 - 2
docs/reference/elasticsearch/rest-apis/collapse-search-results.md

@@ -173,7 +173,7 @@ GET /my-index-000001/_search
 
 ## Rescore collapse results [rescore-collapse-results]
 
-You can use field collapsing alongside the [`rescore`](/reference/elasticsearch/rest-apis/filter-search-results.md#rescore) search parameter. Rescorers run on every shard for the top-ranked document per collapsed field. To maintain a reliable order, it is recommended to cluster documents sharing the same collapse field value on one shard. This is achieved by assigning the collapse field value as the [routing key](/reference/elasticsearch/rest-apis/search-shard-routing.md#search-routing) during indexing:
+You can use field collapsing alongside the [`rescore`](/reference/elasticsearch/rest-apis/rescore-search-results.md#rescore) search parameter. Rescorers run on every shard for the top-ranked document per collapsed field. To maintain a reliable order, it is recommended to cluster documents sharing the same collapse field value on one shard. This is achieved by assigning the collapse field value as the [routing key](/reference/elasticsearch/rest-apis/search-shard-routing.md#search-routing) during indexing:
 
 ```console
 POST /my-index-000001/_doc?routing=xyz      <1>
@@ -189,7 +189,7 @@ POST /my-index-000001/_doc?routing=xyz      <1>
 
 By doing this, you guarantee that only one top document per collapse key gets rescored globally.
 
-The following request utilizes field collapsing on the `user.id` field and then rescores the top groups with a [query rescorer](/reference/elasticsearch/rest-apis/filter-search-results.md#query-rescorer):
+The following request utilizes field collapsing on the `user.id` field and then rescores the top groups with a [query rescorer](/reference/elasticsearch/rest-apis/rescore-search-results.md#query-rescorer):
 
 ```console
 GET /my-index-000001/_search

+ 0 - 123
docs/reference/elasticsearch/rest-apis/filter-search-results.md

@@ -12,10 +12,6 @@ You can use two methods to filter search results:
 * Use a boolean query with a `filter` clause. Search requests apply [boolean filters](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to both search hits and [aggregations](/reference/aggregations/index.md).
 * Use the search API’s `post_filter` parameter. Search requests apply [post filters](#post-filter) only to search hits, not aggregations. You can use a post filter to calculate aggregations based on a broader result set, and then further narrow the results.
 
-    You can also [rescore](#rescore) hits after the post filter to improve relevance and reorder results.
-
-
-
 ## Post filter [post-filter]
 
 When you use the `post_filter` parameter to filter search results, the search hits are filtered after the aggregations are calculated. A post filter has no impact on the aggregation results.
@@ -125,122 +121,3 @@ GET /shirts/_search
 2. The `colors` agg returns popular colors for shirts by Gucci.
 3. The `color_red` agg limits the `models` sub-aggregation to **red** Gucci shirts.
 4. Finally, the `post_filter` removes colors other than red from the search `hits`.
-
-
-
-## Rescore filtered search results [rescore]
-
-Rescoring can help to improve precision by reordering just the top (eg 100 - 500) documents returned by the [`query`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) and [`post_filter`](#post-filter) phases, using a secondary (usually more costly) algorithm, instead of applying the costly algorithm to all documents in the index.
-
-A `rescore` request is executed on each shard before it returns its results to be sorted by the node handling the overall search request.
-
-Currently the rescore API has only one implementation: the query rescorer, which uses a query to tweak the scoring. In the future, alternative rescorers may be made available, for example, a pair-wise rescorer.
-
-::::{note}
-An error will be thrown if an explicit [`sort`](/reference/elasticsearch/rest-apis/sort-search-results.md) (other than `_score` in descending order) is provided with a `rescore` query.
-::::
-
-
-::::{note}
-when exposing pagination to your users, you should not change `window_size` as you step through each page (by passing different `from` values) since that can alter the top hits causing results to confusingly shift as the user steps through pages.
-::::
-
-
-
-### Query rescorer [query-rescorer]
-
-The query rescorer executes a second query only on the Top-K results returned by the [`query`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) and [`post_filter`](#post-filter) phases. The number of docs which will be examined on each shard can be controlled by the `window_size` parameter, which defaults to 10.
-
-By default the scores from the original query and the rescore query are combined linearly to produce the final `_score` for each document. The relative importance of the original query and of the rescore query can be controlled with the `query_weight` and `rescore_query_weight` respectively. Both default to `1`.
-
-For example:
-
-```console
-POST /_search
-{
-   "query" : {
-      "match" : {
-         "message" : {
-            "operator" : "or",
-            "query" : "the quick brown"
-         }
-      }
-   },
-   "rescore" : {
-      "window_size" : 50,
-      "query" : {
-         "rescore_query" : {
-            "match_phrase" : {
-               "message" : {
-                  "query" : "the quick brown",
-                  "slop" : 2
-               }
-            }
-         },
-         "query_weight" : 0.7,
-         "rescore_query_weight" : 1.2
-      }
-   }
-}
-```
-
-The way the scores are combined can be controlled with the `score_mode`:
-
-| Score Mode | Description |
-| --- | --- |
-| `total` | Add the original score and the rescore query score. The default. |
-| `multiply` | Multiply the original score by the rescore query score. Usefulfor [`function query`](/reference/query-languages/query-dsl/query-dsl-function-score-query.md) rescores. |
-| `avg` | Average the original score and the rescore query score. |
-| `max` | Take the max of original score and the rescore query score. |
-| `min` | Take the min of the original score and the rescore query score. |
-
-
-### Multiple rescores [multiple-rescores]
-
-It is also possible to execute multiple rescores in sequence:
-
-```console
-POST /_search
-{
-   "query" : {
-      "match" : {
-         "message" : {
-            "operator" : "or",
-            "query" : "the quick brown"
-         }
-      }
-   },
-   "rescore" : [ {
-      "window_size" : 100,
-      "query" : {
-         "rescore_query" : {
-            "match_phrase" : {
-               "message" : {
-                  "query" : "the quick brown",
-                  "slop" : 2
-               }
-            }
-         },
-         "query_weight" : 0.7,
-         "rescore_query_weight" : 1.2
-      }
-   }, {
-      "window_size" : 10,
-      "query" : {
-         "score_mode": "multiply",
-         "rescore_query" : {
-            "function_score" : {
-               "script_score": {
-                  "script": {
-                    "source": "Math.log10(doc.count.value + 2)"
-                  }
-               }
-            }
-         }
-      }
-   } ]
-}
-```
-
-The first one gets the results of the query then the second one gets the results of the first, etc. The second rescore will "see" the sorting done by the first rescore so it is possible to use a large window on the first rescore to pull documents into a smaller window for the second rescore.
-

+ 1 - 1
docs/reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md

@@ -114,7 +114,7 @@ The `rrf` retriever does not currently support:
 
 * [scroll](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-api-scroll-query-param)
 * [sort](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-sort-param)
-* [rescore](/reference/elasticsearch/rest-apis/filter-search-results.md#rescore)
+* [rescore](/reference/elasticsearch/rest-apis/rescore-search-results.md#rescore)
 
 Using unsupported features as part of a search with an `rrf` retriever results in an exception.
 

+ 211 - 0
docs/reference/elasticsearch/rest-apis/rescore-search-results.md

@@ -0,0 +1,211 @@
+---
+mapped_pages:
+  - https://www.elastic.co/guide/en/elasticsearch/reference/current/rescore-search-results.html
+applies_to:
+  stack: all
+  serverless: all
+---
+
+# Rescore search results
+
+Rescoring can help to improve precision by reordering just the top
+(e.g. 100 - 500) documents returned by initial retrieval phase
+(query, knn search) by using a secondary (usually more costly) algorithm,
+instead of applying the costly algorithm to all documents in the index.
+
+## How `rescore` works [rescore]
+
+A `rescore` request is executed on each shard before it returns its results
+to be sorted by the node handling the overall search request.
+
+The rescore API has 3 options:
+
+1. `query` rescorer that executes a provided `rescore_query` on the top documents
+2. `script` rescorer that uses a script to modify the scores of the top documents
+3. `learning_to_rank` rescorer that uses an LTR model to re-rank the top documents
+
+All rescores have the `window_size` parameter that controls how many top
+documents will be considered for rescoring. The default is 10.
+
+::::{note}
+When implementing pagination, keep the `window_size` consistent across pages.
+Changing it while advancing through results (by using different `from` values)
+can cause the top hits to shift, leading to a confusing user experience.
+::::
+
+## Query Rescorer [query-rescorer]
+
+The query rescorer executes a second query only on the top documents returned
+from the previous phase. The number of docs which is examined on each shard
+can be controlled by the `window_size` parameter.
+
+By default, the scores from the original query and the rescore query are combined
+linearly to produce the final `_score` for each document.
+The relative importance of the original query and of the rescore query can be
+controlled with the `query_weight` and `rescore_query_weight` respectively.
+Both default to `1`.
+
+For example:
+
+```console
+POST /_search
+{
+   "query" : {
+      "match" : {
+         "message" : {
+            "operator" : "or",
+            "query" : "the quick brown"
+         }
+      }
+   },
+   "rescore" : {
+      "window_size" : 10,
+      "query" : {
+         "rescore_query" : {
+            "match_phrase" : {
+               "message" : {
+                  "query" : "the quick brown",
+                  "slop" : 2
+               }
+            }
+         },
+         "query_weight" : 0.7,
+         "rescore_query_weight" : 1.2
+      }
+   }
+}
+```
+
+::::{note}
+An error will be thrown if an explicit [`sort`](/reference/elasticsearch/rest-apis/sort-search-results.md)
+(other than `_score` in descending order) is provided with a `rescore` query.
+::::
+
+
+The way the scores are combined can be controlled with the `score_mode`:
+
+| Score Mode | Description                                                                                                                                                             |
+| --- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `total` | Add the original score and the rescore query score. The default.                                                                                                        |
+| `multiply` | Multiply the original score by the rescore query score. Useful for [`function query`](/reference/query-languages/query-dsl/query-dsl-function-score-query.md) rescores. |
+| `avg` | Average the original score and the rescore query score.                                                                                                                 |
+| `max` | Take the max of original score and the rescore query score.                                                                                                             |
+| `min` | Take the min of the original score and the rescore query score.                                                                                                         |
+
+## Script rescorer  [script-rescorer]
+```{applies_to}
+stack: ga 9.2
+```
+
+`script` rescorer uses a script to rescore the top documents returned
+from the previous phase. The script has access to the original score as well
+as values of document fields.
+
+For example, the following script rescores documents based on the document's
+original query score and the value of field `num_likes`:
+
+```console
+POST /_search
+{
+   "query" : {
+      "match" : {
+         "message" : {
+            "operator" : "or",
+            "query" : "the quick brown"
+         }
+      }
+   },
+   "rescore" : {
+      "window_size" : 10,
+      "script" : {
+         "script" : {
+            "source": "doc['num_likes'].value * params.multiplier + _score",
+            "parameters": {
+               "multiplier": 0.1
+            }
+         }
+      }
+   }
+}
+```
+
+## Learning to rank rescorer [learning-to-rank-rescorer]
+
+`learning_to_rank` uses an LTR model to rescore the top documents. You must
+provide the `model_id` of a deployed model, as well as any named parameters
+required by the query templates for features used by the model.
+
+```console
+GET my-index/_search
+{
+  "query": {
+    "multi_match": {
+      "fields": ["title", "content"],
+      "query": "the quick brown fox"
+    }
+  },
+  "rescore": {
+    "learning_to_rank": {
+      "model_id": "ltr-model",
+      "params": {
+        "query_text": "the quick brown fox"
+      }
+    },
+    "window_size": 100
+  }
+}
+```
+
+## Multiple rescores [multiple-rescores]
+
+You can apply multiple rescoring operations in sequence. The first rescorer
+works on the top documents from the initial retrieval phase, while the second
+rescorer works on the output of the first rescorer, and so on. A common practice
+is to use a larger window for the first rescorer and smaller windows for more
+expensive subsequent rescorers.
+
+```console
+POST /_search
+{
+  "query": {
+    "match": {
+      "message": {
+        "operator": "or",
+        "query": "the quick brown"
+      }
+    }
+  },
+  "rescore": [
+    {
+      "window_size": 10,
+      "query": {
+        "rescore_query": {
+          "match_phrase": {
+            "message": {
+              "query": "the quick brown",
+              "slop": 2
+            }
+          }
+        },
+        "query_weight": 0.7,
+        "rescore_query_weight": 1.2
+      }
+    },
+    {
+      "window_size": 5,
+      "query": {
+        "score_mode": "multiply",
+        "rescore_query": {
+          "function_score": {
+            "script_score": {
+              "script": {
+                "source": "Math.log10(doc.count.value + 2)"
+              }
+            }
+          }
+        }
+      }
+    }
+  ]
+}
+```

+ 2 - 2
docs/reference/elasticsearch/rest-apis/retrievers.md

@@ -27,7 +27,7 @@ The following retrievers are available:
 :   The [pinned](retrievers/pinned-retriever.md) retriever always places specified documents at the top of the results, with the remaining hits provided by a secondary retriever.
 
 `rescorer`
-:   The [rescorer](retrievers/rescorer-retriever.md) retriever replaces the functionality of the [query rescorer](/reference/elasticsearch/rest-apis/filter-search-results.md#rescore).
+:   The [rescorer](retrievers/rescorer-retriever.md) retriever replaces the functionality of the [query rescorer](/reference/elasticsearch/rest-apis/rescore-search-results.md#rescore).
 
 `rrf`
 :   The [rrf](retrievers/rrf-retriever.md) retriever produces top documents from [reciprocal rank fusion (RRF)](/reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md).
@@ -63,7 +63,7 @@ When a retriever is specified as part of a search, the following elements are no
 * [`search_after`](/reference/elasticsearch/rest-apis/paginate-search-results.md#search-after)
 * [`terminate_after`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#request-body-search-terminate-after)
 * [`sort`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-sort-param)
-* [`rescore`](/reference/elasticsearch/rest-apis/filter-search-results.md#rescore) use a [rescorer retriever](retrievers/rescorer-retriever.md) instead
+* [`rescore`](/reference/elasticsearch/rest-apis/rescore-search-results.md#rescore) use a [rescorer retriever](retrievers/rescorer-retriever.md) instead
 
 
 ## Multi-field query format [multi-field-query-format]

+ 2 - 2
docs/reference/elasticsearch/rest-apis/retrievers/rescorer-retriever.md

@@ -24,9 +24,9 @@ When using the `rescorer`, an error is returned if the following conditions are
 ## Parameters [rescorer-retriever-parameters]
 
 `rescore`
-:   (Required. [A rescorer definition or an array of rescorer definitions](/reference/elasticsearch/rest-apis/filter-search-results.md#rescore))
+:   (Required. [A rescorer definition or an array of rescorer definitions](/reference/elasticsearch/rest-apis/rescore-search-results.md#rescore))
 
-    Defines the [rescorers](/reference/elasticsearch/rest-apis/filter-search-results.md#rescore) applied sequentially to the top documents returned by the child retriever.
+    Defines the [rescorers](/reference/elasticsearch/rest-apis/rescore-search-results.md#rescore) applied sequentially to the top documents returned by the child retriever.
 
 
 `retriever`

+ 1 - 0
docs/reference/elasticsearch/toc.yml

@@ -86,6 +86,7 @@ toc:
             - file: rest-apis/collapse-search-results.md
             - file: rest-apis/create-index-from-source.md
             - file: rest-apis/filter-search-results.md
+            - file: rest-apis/rescore-search-results.md
             - file: rest-apis/find-text-structure-examples.md
             - file: rest-apis/highlighting.md
             - file: rest-apis/optimistic-concurrency-control.md

+ 1 - 1
docs/reference/query-languages/query-dsl/query-dsl-sparse-vector-query.md

@@ -208,7 +208,7 @@ GET my-index/_search
 
 The following is an extension to the above example that adds a pruning configuration to the `sparse_vector` query. The pruning configuration identifies non-significant tokens to prune from the query in order to improve query performance.
 
-Token pruning happens at the shard level. While this should result in the same tokens being labeled as insignificant across shards, this is not guaranteed based on the composition of each shard. Therefore, if you are running `sparse_vector` with a `pruning_config` on a multi-shard index, we strongly recommend adding a [Rescore filtered search results](/reference/elasticsearch/rest-apis/filter-search-results.md#rescore) function with the tokens that were originally pruned from the query. This will help mitigate any shard-level inconsistency with pruned tokens and provide better relevance overall.
+Token pruning happens at the shard level. While this should result in the same tokens being labeled as insignificant across shards, this is not guaranteed based on the composition of each shard. Therefore, if you are running `sparse_vector` with a `pruning_config` on a multi-shard index, we strongly recommend adding a [Rescore filtered search results](/reference/elasticsearch/rest-apis/rescore-search-results.md#rescore) function with the tokens that were originally pruned from the query. This will help mitigate any shard-level inconsistency with pruned tokens and provide better relevance overall.
 
 ```console
 GET my-index/_search

+ 1 - 1
docs/reference/query-languages/query-dsl/query-dsl-text-expansion-query.md

@@ -192,7 +192,7 @@ GET my-index/_search
 
 The following is an extension to the above example that adds a [preview] pruning configuration to the `text_expansion` query. The pruning configuration identifies non-significant tokens to prune from the query in order to improve query performance.
 
-Token pruning happens at the shard level. While this should result in the same tokens being labeled as insignificant across shards, this is not guaranteed based on the composition of each shard. Therefore, if you are running `text_expansion` with a `pruning_config` on a multi-shard index, we strongly recommend adding a [Rescore filtered search results](/reference/elasticsearch/rest-apis/filter-search-results.md#rescore) function with the tokens that were originally pruned from the query. This will help mitigate any shard-level inconsistency with pruned tokens and provide better relevance overall.
+Token pruning happens at the shard level. While this should result in the same tokens being labeled as insignificant across shards, this is not guaranteed based on the composition of each shard. Therefore, if you are running `text_expansion` with a `pruning_config` on a multi-shard index, we strongly recommend adding a [Rescore filtered search results](/reference/elasticsearch/rest-apis/rescore-search-results.md#rescore) function with the tokens that were originally pruned from the query. This will help mitigate any shard-level inconsistency with pruned tokens and provide better relevance overall.
 
 ```console
 GET my-index/_search

+ 1 - 1
docs/reference/query-languages/query-dsl/query-dsl-weighted-tokens-query.md

@@ -75,7 +75,7 @@ POST _search
 
 The following example adds a pruning configuration to the `text_expansion` query. The pruning configuration identifies non-significant tokens to prune from the query in order to improve query performance.
 
-Token pruning happens at the shard level. While this should result in the same tokens being labeled as insignificant across shards, this is not guaranteed based on the composition of each shard. Therefore, if you are running `text_expansion` with a `pruning_config` on a multi-shard index, we strongly recommend adding a [Rescore filtered search results](/reference/elasticsearch/rest-apis/filter-search-results.md#rescore) function with the tokens that were originally pruned from the query. This will help mitigate any shard-level inconsistency with pruned tokens and provide better relevance overall.
+Token pruning happens at the shard level. While this should result in the same tokens being labeled as insignificant across shards, this is not guaranteed based on the composition of each shard. Therefore, if you are running `text_expansion` with a `pruning_config` on a multi-shard index, we strongly recommend adding a [Rescore filtered search results](/reference/elasticsearch/rest-apis/rescore-search-results.md#rescore) function with the tokens that were originally pruned from the query. This will help mitigate any shard-level inconsistency with pruned tokens and provide better relevance overall.
 
 ```console
 GET my-index/_search