Browse Source

Document upcoming scoring changes. (#22806)

Adrien Grand 8 years ago
parent
commit
dc62255ddd
1 changed files with 27 additions and 0 deletions
  1. 27 0
      docs/reference/migration/migrate_6_0/search.asciidoc

+ 27 - 0
docs/reference/migration/migrate_6_0/search.asciidoc

@@ -45,3 +45,30 @@ have any effect in previous versions.
 * The `"time"` field showing human readable timing output has been replaced by the `"time_in_nanos"`
   field which displays the elapsed time in nanoseconds. The `"time"` field can be turned on by adding
   `"?human=true"` to the request url. It will display a rounded, human readable time value.
+
+==== Scoring changes
+
+==== Query normalization
+
+Query normalization has been removed. This means that the TF-IDF similarity no
+longer tries to make scores comparable across queries and that boosts are now
+integrated into scores as simple multiplicative factors.
+
+Other similarities are not affected as they did not normalize scores and
+already integrated boosts into scores as multiplicative factors.
+
+See https://issues.apache.org/jira/browse/LUCENE-7347[`LUCENE-7347`] for more
+information.
+
+==== Coordination factors
+
+Coordination factors have been removed from the scoring formula. This means that
+boolean queries no longer score based on the number of matching clauses.
+Instead, they always return the sum of the scores of the matching clauses.
+
+As a consequence, use of the TF-IDF similarity is now discouraged as this was
+an important component of the quality of the scores that this similarity
+produces. BM25 is recommended instead.
+
+See https://issues.apache.org/jira/browse/LUCENE-7347[`LUCENE-7347`] for more
+information.