|
@@ -112,7 +112,10 @@ Type name: `DFR`
|
|
|
==== IB similarity.
|
|
|
|
|
|
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
|
|
|
-based model] . This similarity has the following options:
|
|
|
+based model] . The algorithm is based on the concept that the information content in any symbolic 'distribution'
|
|
|
+sequence is primarily determined by the repetitive usage of its basic elements.
|
|
|
+For written texts this challenge would correspond to comparing the writing styles of diferent authors.
|
|
|
+This similarity has the following options:
|
|
|
|
|
|
[horizontal]
|
|
|
`distribution`:: Possible values: `ll` and `spl`.
|
|
@@ -138,11 +141,11 @@ Type name: `LMDirichlet`
|
|
|
==== LM Jelinek Mercer similarity.
|
|
|
|
|
|
http://lucene.apache.org/core/5_2_1/core/org/apache/lucene/search/similarities/LMJelinekMercerSimilarity.html[LM
|
|
|
-Jelinek Mercer similarity] . This similarity has the following options:
|
|
|
+Jelinek Mercer similarity] . The algorithm attempts to capture important patterns in the text, while leaving out noise. This similarity has the following options:
|
|
|
|
|
|
[horizontal]
|
|
|
`lambda`:: The optimal value depends on both the collection and the query. The optimal value is around `0.1`
|
|
|
-for title queries and `0.7` for long queries. Default to `0.1`.
|
|
|
+for title queries and `0.7` for long queries. Default to `0.1`. When value approaches `0`, documents that match more query terms will be ranked higher than those that match fewer terms.
|
|
|
|
|
|
Type name: `LMJelinekMercer`
|
|
|
|