2 years ago · 24c600748a
--- a/docs/reference/search/rrf.asciidoc
+++ b/docs/reference/search/rrf.asciidoc
@@ -1,6 +1,427 @@
 
				 [[rrf]]
			
 
				 === Reciprocal rank fusion
			
 
				 
			
 
				-Reciprocal Rank Fusion (RRF) is a simple method to combine document result sets
			
 
				-from multiple queries where the queries' document scores may be unrelated.
			
 
				+preview::[]
			
 
				+
			
 
				+https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf[Reciprocal rank fusion (RRF)]
			
 
				+is a method for combining multiple result sets with different relevance
			
 
				+indicators into a single result set. RRF requires no tuning, and the different
			
 
				+relevance indicators do not have to be related to each other to achieve high-quality
			
 
				+results.
			
 
				+
			
 
				+RRF uses the following formula to determine the score for ranking each document:
			
 
				+
			
 
				+[source,python]
			
 
				+----
			
 
				+score = 0.0
			
 
				+for q in queries:
			
 
				+    if d in result(q):
			
 
				+        score += 1.0 / ( k + rank( result(q), d ) )
			
 
				+return score
			
 
				+
			
 
				+# where
			
 
				+# k is a ranking constant
			
 
				+# q is a query in the set of queries
			
 
				+# d is a document in the result set of q
			
 
				+# result(q) is the result set of q
			
 
				+# rank( result(q), d ) is d's rank within the result(q) starting from 1
			
 
				+----
			
 
				+// NOTCONSOLE
			
 
				+
			
 
				+[[rrf-api]]
			
 
				+==== Reciprocal rank fusion API
			
 
				+
			
 
				+You can use RRF as part of a <<search-search, search>> to combine and rank
			
 
				+documents using multiple result sets from
			
 
				+
			
 
				+* 1 query and 1 or more kNN searches
			
 
				+* 2 or more kNN searches
			
 
				+
			
 
				+The `rrf` parameter is an optional object defined as part of a search request's
			
 
				+<<request-body-rank, rank parameter>>. The `rrf` object contains the following
			
 
				+parameters:
			
 
				+
			
 
				+`rank_constant`::
			
 
				+(Optional, integer) This value determines how much influence documents in individual
			
 
				+result sets per query have over the final ranked result set. A higher value indicates
			
 
				+that lower ranked documents have more influence. This value must be greater than or
			
 
				+equal to `1`. Defaults to `60`.
			
 
				+
			
 
				+`window_size`::
			
 
				+(Optional, integer) This value determines the size of the individual result sets per
			
 
				+query. A higher value will improve result relevance at the cost of performance. The final
			
 
				+ranked result set is pruned down to the search request's <<search-size-param, size>.
			
 
				+`window_size` must be greater than or equal to `size` and greater than or equal to `1`.
			
 
				+Defaults to `100`.
			
 
				+
			
 
				+An example request using RRF:
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET example-index/_search
			
 
				+{
			
 
				+    "query": {
			
 
				+        "term": {
			
 
				+            "text": "shoes"
			
 
				+        }
			
 
				+    },
			
 
				+    "knn": {
			
 
				+        "field": "vector",
			
 
				+        "query_vector": [1.25, 2, 3.5],
			
 
				+        "k": 50,
			
 
				+        "num_candidates": 100
			
 
				+    },
			
 
				+    "rank": {
			
 
				+        "rrf": {
			
 
				+            "window_size": 50,
			
 
				+            "rank_constant": 20
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+----
			
 
				+// TEST[skip:example fragment]
			
 
				+
			
 
				+In the above example, we first execute the kNN search to get its global top 50 results.
			
 
				+Then we execute the query to get its global top 50 results. Afterwards, on a coordinating
			
 
				+node, we combine the knn search results with the query results and rank them based on the
			
 
				+RRF method to get the final top 10 results.
			
 
				+
			
 
				+Note that if `k` from a knn search is larger than `window_size`, the results are
			
 
				+truncated to `window_size`. If `k` is smaller than `window_size`, the results are
			
 
				+`k` size.
			
 
				+
			
 
				+[[rrf-supported-features]]
			
 
				+==== Reciprocal rank fusion supported features
			
 
				+
			
 
				+RRF does support:
			
 
				+
			
 
				+* <<search-from-param, from>>
			
 
				+* <<search-aggregations, aggregations>>
			
 
				+
			
 
				+RRF does not currently support:
			
 
				+
			
 
				+* <<search-api-scroll-query-param, scroll>>
			
 
				+* <<search-api-pit, point in time>>
			
 
				+* <<search-sort-param, sort>>
			
 
				+* <<rescore, rescore>>
			
 
				+* <<search-suggesters, suggesters>>
			
 
				+* <<highlighting, highlighting>>
			
 
				+* <<collapse-search-results, collapse>>
			
 
				+* <<request-body-search-explain, explain>>
			
 
				+* <<profiling-queries, profiling>>
			
 
				+
			
 
				+Using unsupported features as part of a search using RRF will result
			
 
				+in an exception.
			
 
				+
			
 
				+[[rrf-full-example]]
			
 
				+==== Reciprocal rank fusion full example
			
 
				+
			
 
				+We begin by creating a mapping for an index with a text field, a vector field,
			
 
				+and an integer field along with indexing several documents. For this example we
			
 
				+are going to use a vector with only a single dimension to make the ranking easier
			
 
				+to explain.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+PUT example-index
			
 
				+{
			
 
				+  "mappings": {
			
 
				+        "properties": {
			
 
				+            "text" : {
			
 
				+                "type" : "text"
			
 
				+            },
			
 
				+            "vector": {
			
 
				+                "type": "dense_vector",
			
 
				+                "dims": 1,
			
 
				+                "index": true,
			
 
				+                "similarity": "l2_norm"
			
 
				+            },
			
 
				+            "integer" : {
			
 
				+                "type" : "integer"
			
 
				+            }
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+
			
 
				+PUT example-index/_doc/1
			
 
				+{
			
 
				+    "text" : "rrf",
			
 
				+    "vector" : [5],
			
 
				+    "integer": 1
			
 
				+}
			
 
				+
			
 
				+PUT example-index/_doc/2
			
 
				+{
			
 
				+    "text" : "rrf rrf",
			
 
				+    "vector" : [4],
			
 
				+    "integer": 2
			
 
				+}
			
 
				+
			
 
				+PUT example-index/_doc/3
			
 
				+{
			
 
				+    "text" : "rrf rrf rrf",
			
 
				+    "vector" : [3],
			
 
				+    "integer": 1
			
 
				+}
			
 
				+
			
 
				+PUT example-index/_doc/4
			
 
				+{
			
 
				+    "text" : "rrf rrf rrf rrf",
			
 
				+    "integer": 2
			
 
				+}
			
 
				+
			
 
				+PUT example-index/_doc/5
			
 
				+{
			
 
				+    "vector" : [0],
			
 
				+    "integer": 1
			
 
				+}
			
 
				+
			
 
				+POST example-index/_refresh
			
 
				+----
			
 
				+// TEST
			
 
				+
			
 
				+We now execute a search using RRF with a query, a kNN search, and
			
 
				+a terms aggregation.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET example-index/_search
			
 
				+{
			
 
				+    "query": {
			
 
				+        "term": {
			
 
				+            "text": "rrf"
			
 
				+        }
			
 
				+    },
			
 
				+    "knn": {
			
 
				+        "field": "vector",
			
 
				+        "query_vector": [3],
			
 
				+        "k": 5,
			
 
				+        "num_candidates": 5
			
 
				+    },
			
 
				+    "rank": {
			
 
				+        "rrf": {
			
 
				+            "window_size": 5,
			
 
				+            "rank_constant": 1
			
 
				+        }
			
 
				+    },
			
 
				+    "size": 3,
			
 
				+    "aggs": {
			
 
				+        "int_count": {
			
 
				+            "terms": {
			
 
				+                "field": "integer"
			
 
				+            }
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+----
			
 
				+// TEST[continued]
			
 
				+
			
 
				+And we receive the response with ranked `hits` and the terms
			
 
				+aggregation result. Note that `_score` is `null`, and we instead
			
 
				+use `_rank` to show our top-ranked documents.
			
 
				+
			
 
				+[source,console-response]
			
 
				+----
			
 
				+{
			
 
				+    "took": ...,
			
 
				+    "timed_out" : false,
			
 
				+    "_shards" : {
			
 
				+        "total" : 1,
			
 
				+        "successful" : 1,
			
 
				+        "skipped" : 0,
			
 
				+        "failed" : 0
			
 
				+    },
			
 
				+    "hits" : {
			
 
				+        "total" : {
			
 
				+            "value" : 5,
			
 
				+            "relation" : "eq"
			
 
				+        },
			
 
				+        "max_score" : null,
			
 
				+        "hits" : [
			
 
				+            {
			
 
				+                "_index" : "example-index",
			
 
				+                "_id" : "3",
			
 
				+                "_score" : null,
			
 
				+                "_rank" : 1,
			
 
				+                "_source" : {
			
 
				+                    "integer" : 1,
			
 
				+                    "vector" : [
			
 
				+                        3
			
 
				+                    ],
			
 
				+                    "text" : "rrf rrf rrf"
			
 
				+                }
			
 
				+            },
			
 
				+            {
			
 
				+                "_index" : "example-index",
			
 
				+                "_id" : "2",
			
 
				+                "_score" : null,
			
 
				+                "_rank" : 2,
			
 
				+                "_source" : {
			
 
				+                    "integer" : 2,
			
 
				+                    "vector" : [
			
 
				+                        4
			
 
				+                    ],
			
 
				+                    "text" : "rrf rrf"
			
 
				+                }
			
 
				+            },
			
 
				+            {
			
 
				+                "_index" : "example-index",
			
 
				+                "_id" : "4",
			
 
				+                "_score" : null,
			
 
				+                "_rank" : 3,
			
 
				+                "_source" : {
			
 
				+                    "integer" : 2,
			
 
				+                    "text" : "rrf rrf rrf rrf"
			
 
				+                }
			
 
				+            }
			
 
				+        ]
			
 
				+    },
			
 
				+    "aggregations" : {
			
 
				+        "int_count" : {
			
 
				+            "doc_count_error_upper_bound" : 0,
			
 
				+            "sum_other_doc_count" : 0,
			
 
				+            "buckets" : [
			
 
				+                {
			
 
				+                    "key" : 1,
			
 
				+                    "doc_count" : 3
			
 
				+                },
			
 
				+                {
			
 
				+                    "key" : 2,
			
 
				+                    "doc_count" : 2
			
 
				+                }
			
 
				+            ]
			
 
				+        }
			
 
				+    }
			
 
				+}
			
 
				+----
			
 
				+// TESTRESPONSE[s/: \.\.\./: $body.$_path/]
			
 
				+
			
 
				+Let's break down how these hits were ranked. We
			
 
				+start by running the query and the kNN search
			
 
				+separately to collect what their individual hits are.
			
 
				+
			
 
				+First, we look at the hits for the query.
			
 
				+
			
 
				+[source,console-result]
			
 
				+----
			
 
				+"hits" : [
			
 
				+    {
			
 
				+        "_index" : "example-index",
			
 
				+        "_id" : "4",
			
 
				+        "_score" : 0.16152832,              <1>
			
 
				+        "_source" : {
			
 
				+            "integer" : 2,
			
 
				+            "text" : "rrf rrf rrf rrf"
			
 
				+        }
			
 
				+    },
			
 
				+    {
			
 
				+        "_index" : "example-index",
			
 
				+        "_id" : "3",                        <2>
			
 
				+        "_score" : 0.15876243,
			
 
				+        "_source" : {
			
 
				+            "integer" : 1,
			
 
				+            "vector" : [3],
			
 
				+            "text" : "rrf rrf rrf"
			
 
				+        }
			
 
				+    },
			
 
				+    {
			
 
				+        "_index" : "example-index",
			
 
				+        "_id" : "2",                        <3>
			
 
				+        "_score" : 0.15350538,
			
 
				+        "_source" : {
			
 
				+            "integer" : 2,
			
 
				+            "vector" : [4],
			
 
				+            "text" : "rrf rrf"
			
 
				+        }
			
 
				+    },
			
 
				+    {
			
 
				+        "_index" : "example-index",
			
 
				+        "_id" : "1",                        <4>
			
 
				+        "_score" : 0.13963442,
			
 
				+        "_source" : {
			
 
				+            "integer" : 1,
			
 
				+            "vector" : [5],
			
 
				+            "text" : "rrf"
			
 
				+        }
			
 
				+    }
			
 
				+]
			
 
				+----
			
 
				+// TEST[skip:example fragment]
			
 
				+<1> rank 1, `_id` 4
			
 
				+<2> rank 2, `_id` 3
			
 
				+<3> rank 3, `_id` 2
			
 
				+<4> rank 4, `_id` 1
			
 
				+
			
 
				+Note that our first hit doesn't have a value for the `vector` field. Now,
			
 
				+we look at the results for the kNN search.
			
 
				+
			
 
				+[source,console-result]
			
 
				+----
			
 
				+"hits" : [
			
 
				+    {
			
 
				+        "_index" : "example-index",
			
 
				+        "_id" : "3",                   <1>
			
 
				+        "_score" : 1.0,
			
 
				+        "_source" : {
			
 
				+            "integer" : 1,
			
 
				+            "vector" : [3],
			
 
				+            "text" : "rrf rrf rrf"
			
 
				+        }
			
 
				+    },
			
 
				+    {
			
 
				+        "_index" : "example-index",
			
 
				+        "_id" : "2",                   <2>
			
 
				+        "_score" : 0.5,
			
 
				+        "_source" : {
			
 
				+            "integer" : 2,
			
 
				+            "vector" : [4],
			
 
				+            "text" : "rrf rrf"
			
 
				+        }
			
 
				+    },
			
 
				+    {
			
 
				+        "_index" : "example-index",
			
 
				+        "_id" : "1",                   <3>
			
 
				+        "_score" : 0.2,
			
 
				+        "_source" : {
			
 
				+            "integer" : 1,
			
 
				+            "vector" : [5],
			
 
				+            "text" : "rrf"
			
 
				+        }
			
 
				+    },
			
 
				+    {
			
 
				+        "_index" : "example-index",
			
 
				+        "_id" : "5",                   <4>
			
 
				+        "_score" : 0.1,
			
 
				+        "_source" : {
			
 
				+            "integer" : 1,
			
 
				+            "vector" : [0]
			
 
				+        }
			
 
				+    }
			
 
				+]
			
 
				+----
			
 
				+// TEST[skip:example fragment]
			
 
				+<1> rank 1, `_id` 3
			
 
				+<2> rank 2, `_id` 2
			
 
				+<3> rank 3, `_id` 1
			
 
				+<4> rank 4, `_id` 5
			
 
				+
			
 
				+We can now take the two individually ranked result sets and apply the
			
 
				+RRF formula to them to get our final ranking.
			
 
				+
			
 
				+[source,python]
			
 
				+----
			
 
				+# doc  | query     | knn       | score
			
 
				+_id: 1 = 1.0/(1+4) + 1.0/(1+3) = 0.4500
			
 
				+_id: 2 = 1.0/(1+3) + 1.0/(1+2) = 0.5833
			
 
				+_id: 3 = 1.0/(1+2) + 1.0/(1+1) = 0.8333
			
 
				+_id: 4 = 1.0/(1+1)             = 0.5000
			
 
				+_id: 5 =             1.0/(1+4) = 0.2000
			
 
				+----
			
 
				+// NOTCONSOLE
			
 
				+
			
 
				+We rank the documents based on the RRF formula with a `window_size` of `5`
			
 
				+truncating the bottom `2` docs in our RRF result set with a `size` of `3`.
			
 
				+We end with `_id: 3` as `_rank: 1`, `_id: 2` as `_rank: 2`, and
			
 
				+`_id: 4` as `_rank: 3`. This ranking matches the result set from the
			
 
				+original RRF search as expected.
			
 
				 
			
--- a/docs/reference/search/search.asciidoc
+++ b/docs/reference/search/search.asciidoc
@@ -102,6 +102,7 @@ Defaults to `open`.
 
				 (Optional, Boolean) If `true`, returns detailed information about score
			
 
				 computation as part of a hit. Defaults to `false`.
			
 
				 
			
 
				+[[search-from-param]]
			
 
				 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=from]
			
 
				 +
			
 
				 By default, you cannot page through more than 10,000 hits using the `from` and
			
@@ -231,6 +232,7 @@ searches.
 
				 (Optional, Boolean) If `true`, returns sequence number and primary term of the
			
 
				 last modification of each hit. See <<optimistic-concurrency-control>>.
			
 
				 
			
 
				+[[search-size-param]]
			
 
				 `size`::
			
 
				 (Optional, integer) Defines the number of hits to return. Defaults to `10`.
			
 
				 +
			
@@ -238,6 +240,7 @@ By default, you cannot page through more than 10,000 hits using the `from` and
 
				 `size` parameters. To page through more hits, use the
			
 
				 <<search-after,`search_after`>> parameter.
			
 
				 
			
 
				+[[search-sort-param]]
			
 
				 `sort`::
			
 
				 (Optional, string) A comma-separated list of <field>:<direction> pairs.
			
 
				 
			
@@ -527,6 +530,7 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=knn-similarity]
 
				 Minimum <<relevance-scores,`_score`>> for matching documents. Documents with a
			
 
				 lower `_score` are not included in the search results.
			
 
				 
			
 
				+[[search-api-pit]]
			
 
				 `pit`::
			
 
				 (Optional, object)
			
 
				 Limits the search to a <<point-in-time-api,point in time (PIT)>>. If you provide
			
@@ -552,8 +556,11 @@ Period of time used to extend the life of the PIT.
 
				 
			
 
				 [[request-body-rank]]
			
 
				 `rank`::
			
 
				-(Optional, object) Defines a method for combining and ranking `1` standard query
			
 
				-with `1..n` knn searches or no standard query with `2..n` knn searches.
			
 
				+(Optional, object) Defines a method for combining and ranking result sets from
			
 
				+either:
			
 
				++
			
 
				+* 1 query and 1 or more kNN searches
			
 
				+* 2 or more kNN searches
			
 
				 +
			
 
				 .Ranking methods
			
 
				 [%collapsible%open]