similarity.asciidoc 4.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
  1. [[index-modules-similarity]]
  2. == Similarity module
  3. A similarity (scoring / ranking model) defines how matching documents
  4. are scored. Similarity is per field, meaning that via the mapping one
  5. can define a different similarity per field.
  6. Configuring a custom similarity is considered a expert feature and the
  7. builtin similarities are most likely sufficient as is described in the
  8. <<mapping-core-types,mapping section>>
  9. [float]
  10. [[configuration]]
  11. === Configuring a similarity
  12. Most existing or custom Similarities have configuration options which
  13. can be configured via the index settings as shown below. The index
  14. options can be provided when creating an index or updating index
  15. settings.
  16. [source,js]
  17. --------------------------------------------------
  18. "similarity" : {
  19. "my_similarity" : {
  20. "type" : "DFR",
  21. "basic_model" : "g",
  22. "after_effect" : "l",
  23. "normalization" : "h2",
  24. "normalization.h2.c" : "3.0"
  25. }
  26. }
  27. --------------------------------------------------
  28. Here we configure the DFRSimilarity so it can be referenced as
  29. `my_similarity` in mappings as is illustrate in the below example:
  30. [source,js]
  31. --------------------------------------------------
  32. {
  33. "book" : {
  34. "properties" : {
  35. "title" : { "type" : "string", "similarity" : "my_similarity" }
  36. }
  37. }
  38. --------------------------------------------------
  39. [float]
  40. === Available similarities
  41. [float]
  42. [[default-similarity]]
  43. ==== Default similarity
  44. The default similarity that is based on the TF/IDF model. This
  45. similarity has the following option:
  46. `discount_overlaps`::
  47. Determines whether overlap tokens (Tokens with
  48. 0 position increment) are ignored when computing norm. By default this
  49. is true, meaning overlap tokens do not count when computing norms.
  50. Type name: `default`
  51. [float]
  52. [[bm25]]
  53. ==== BM25 similarity
  54. Another TF/IDF based similarity that has built-in tf normalization and
  55. is supposed to work better for short fields (like names). See
  56. http://en.wikipedia.org/wiki/Okapi_BM25[Okapi_BM25] for more details.
  57. This similarity has the following options:
  58. [horizontal]
  59. `k1`::
  60. Controls non-linear term frequency normalization
  61. (saturation).
  62. `b`::
  63. Controls to what degree document length normalizes tf values.
  64. `discount_overlaps`::
  65. Determines whether overlap tokens (Tokens with
  66. 0 position increment) are ignored when computing norm. By default this
  67. is true, meaning overlap tokens do not count when computing norms.
  68. Type name: `BM25`
  69. [float]
  70. [[drf]]
  71. ==== DFR similarity
  72. Similarity that implements the
  73. http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
  74. from randomness] framework. This similarity has the following options:
  75. [horizontal]
  76. `basic_model`::
  77. Possible values: `be`, `d`, `g`, `if`, `in`, `ine` and `p`.
  78. `after_effect`::
  79. Possible values: `no`, `b` and `l`.
  80. `normalization`::
  81. Possible values: `no`, `h1`, `h2`, `h3` and `z`.
  82. All options but the first option need a normalization value.
  83. Type name: `DFR`
  84. [float]
  85. [[ib]]
  86. ==== IB similarity.
  87. http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
  88. based model] . This similarity has the following options:
  89. [horizontal]
  90. `distribution`:: Possible values: `ll` and `spl`.
  91. `lambda`:: Possible values: `df` and `ttf`.
  92. `normalization`:: Same as in `DFR` similarity.
  93. Type name: `IB`
  94. [float]
  95. [[default-base]]
  96. ==== Default and Base Similarities
  97. By default, Elasticsearch will use whatever similarity is configured as
  98. `default`. However, the similarity functions `queryNorm()` and `coord()`
  99. are not per-field. Consequently, for expert users wanting to change the
  100. implementation used for these two methods, while not changing the
  101. `default`, it is possible to configure a similarity with the name
  102. `base`. This similarity will then be used for the two methods.
  103. You can change the default similarity for all fields like this:
  104. [source,js]
  105. --------------------------------------------------
  106. index.similarity.default.type: BM25
  107. --------------------------------------------------