| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202 | [[consistent-scoring]]=== Getting consistent scoringThe fact that Elasticsearch operates with shards and replicas adds challengeswhen it comes to having good scoring.[float]==== Scores are not reproducibleSay the same user runs the same request twice in a row and documents do not comeback in the same order both times, this is a pretty bad experience isn't it?Unfortunately this is something that can happen if you have replicas(`index.number_of_replicas` is greater than 0). The reason is that Elasticsearchselects the shards that the query should go to in a round-robin fashion, so itis quite likely if you run the same query twice in a row that it will go todifferent copies of the same shard.Now why is it a problem? Index statistics are an important part of the score.And these index statistics may be different across copies of the same sharddue to deleted documents. As you may know when documents are deleted or updated,the old document is not immediately removed from the index, it is just markedas deleted and it will only be removed from disk on the next time that thesegment this old document belongs to is merged. However for practical reasons,those deleted documents are taken into account for index statistics. So imaginethat the primary shard just finished a large merge that removed lots of deleteddocuments, then it might have index statistics that are sufficiently differentfrom the replica (which still have plenty of deleted documents) so that scoresare different too.The recommended way to work around this issue is to use a string that identifiesthe user that is logged is (a user id or session id for instance) as a<<search-request-preference,preference>>. This ensures that all queries of agiven user are always going to hit the same shards, so scores remain moreconsistent across queries.This work around has another benefit: when two documents have the same score,they will be sorted by their internal Lucene doc id (which is unrelated to the`_id`) by default. However these doc ids could be different across copies ofthe same shard. So by always hitting the same shard, we would get moreconsistent ordering of documents that have the same scores.[float]==== Relevancy looks wrongIf you notice that two documents with the same content get different scores orthat an exact match is not ranked first, then the issue might be related tosharding. By default, Elasticsearch makes each shard responsible for producingits own scores. However since index statistics are an important contributor tothe scores, this only works well if shards have similar index statistics. Theassumption is that since documents are routed evenly to shards by default, thenindex statistics should be very similar and scoring would work as expected.However in the event that you either: - use routing at index time, - query multiple _indices_, - or have too little data in your indexthen there are good chances that all shards that are involved in the searchrequest do not have similar index statistics and relevancy could be bad.If you have a small dataset, the easiest way to work around this issue is toindex everything into an index that has a single shard(`index.number_of_shards: 1`), which is the default. Then index statisticswill be the same for all documents and scores will be consistent.Otherwise the recommended way to work around this issue is to use the<<dfs-query-then-fetch,`dfs_query_then_fetch`>> search type. This will makeElasticsearch perform an initial round trip to all involved shards, askingthem for their index statistics relatively to the query, then the coordinatingnode will merge those statistics and send the merged statistics alongside therequest when asking shards to perform the `query` phase, so that shards canuse these global statistics rather than their own statistics in order to do thescoring.In most cases, this additional round trip should be very cheap. However in theevent that your query contains a very large number of fields/terms or fuzzyqueries, beware that gathering statistics alone might not be cheap since allterms have to be looked up in the terms dictionaries in order to look upstatistics.[[static-scoring-signals]]=== Incorporating static relevance signals into the scoreMany domains have static signals that are known to be correlated with relevance.For instance https://en.wikipedia.org/wiki/PageRank[PageRank] and url length aretwo commonly used features for web search in order to tune the score of webpages independently of the query.There are two main queries that allow combining static score contributions withtextual relevance, eg. as computed with BM25: - <<query-dsl-script-score-query,`script_score` query>> - <<query-dsl-rank-feature-query,`rank_feature` query>>For instance imagine that you have a `pagerank` field that you wish tocombine with the BM25 score so that the final score is equal to`score = bm25_score + pagerank / (10 + pagerank)`.With the <<query-dsl-script-score-query,`script_score` query>> the query wouldlook like this://////////////////////////[source,js]--------------------------------------------------PUT index{    "mappings": {        "properties": {            "body": {                "type": "text"            },            "pagerank": {                "type": "long"            }        }    }}--------------------------------------------------// CONSOLE// TEST//////////////////////////[source,js]--------------------------------------------------GET index/_search{    "query" : {        "script_score" : {            "query" : {                "match": { "body": "elasticsearch" }            },            "script" : {                "source" : "_score * saturation(doc['pagerank'].value, 10)" <1>            }        }    }}--------------------------------------------------// CONSOLE//TEST[continued]<1> `pagerank` must be mapped as a <<number>>while with the <<query-dsl-rank-feature-query,`rank_feature` query>> it wouldlook like below://////////////////////////[source,js]--------------------------------------------------PUT index{    "mappings": {        "properties": {            "body": {                "type": "text"            },            "pagerank": {                "type": "rank_feature"            }        }    }}--------------------------------------------------// CONSOLE// TEST//////////////////////////[source,js]--------------------------------------------------GET _search{    "query" : {        "bool" : {            "must": {                "match": { "body": "elasticsearch" }            },            "should": {                "rank_feature": {                    "field": "pagerank", <1>                    "saturation": {                        "pivot": 10                    }                }            }        }    }}--------------------------------------------------// CONSOLE<1> `pagerank` must be mapped as a <<rank-feature,`rank_feature`>> fieldWhile both options would return similar scores, there are trade-offs:<<query-dsl-script-score-query,script_score>> provides a lot of flexibility,enabling you to combine the text relevance score with static signals as youprefer. On the other hand, the <<rank-feature,`rank_feature` query>> onlyexposes a couple ways to incorporate static signails into the score. However,it relies on the <<rank-feature,`rank_feature`>> and<<rank-features,`rank_features`>> fields, which index values in a special waythat allows the <<query-dsl-rank-feature-query,`rank_feature` query>> to skipover non-competitive documents and get the top matches of a query faster.
 |