123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134 |
- [[index-modules-similarity]]
- == Similarity module
- A similarity (scoring / ranking model) defines how matching documents
- are scored. Similarity is per field, meaning that via the mapping one
- can define a different similarity per field.
- Configuring a custom similarity is considered a expert feature and the
- builtin similarities are most likely sufficient as is described in the
- <<mapping-core-types,mapping section>>
- [float]
- === Configuring a similarity
- Most existing or custom Similarities have configuration options which
- can be configured via the index settings as shown below. The index
- options can be provided when creating an index or updating index
- settings.
- [source,js]
- --------------------------------------------------
- "similarity" : {
- "my_similarity" : {
- "type" : "DFR",
- "basic_model" : "g",
- "after_effect" : "l",
- "normalization" : "h2",
- "normalization.h2.c" : "3.0"
- }
- }
- --------------------------------------------------
- Here we configure the DFRSimilarity so it can be referenced as
- `my_similarity` in mappings as is illustrate in the below example:
- [source,js]
- --------------------------------------------------
- {
- "book" : {
- "properties" : {
- "title" : { "type" : "string", "similarity" : "my_similarity" }
- }
- }
- --------------------------------------------------
- [float]
- === Available similarities
- [float]
- ==== Default similarity
- The default similarity that is based on the TF/IDF model. This
- similarity has the following option:
- `discount_overlaps`::
- Determines whether overlap tokens (Tokens with
- 0 position increment) are ignored when computing norm. By default this
- is true, meaning overlap tokens do not count when computing norms.
- Type name: `default`
- [float]
- ==== BM25 similarity
- Another TF/IDF based similarity that has built-in tf normalization and
- is supposed to work better for short fields (like names). See
- http://en.wikipedia.org/wiki/Okapi_BM25[Okapi_BM25] for more details.
- This similarity has the following options:
- [horizontal]
- `k1`::
- Controls non-linear term frequency normalization
- (saturation).
- `b`::
- Controls to what degree document length normalizes tf values.
- `discount_overlaps`::
- Determines whether overlap tokens (Tokens with
- 0 position increment) are ignored when computing norm. By default this
- is true, meaning overlap tokens do not count when computing norms.
- Type name: `BM25`
- [float]
- ==== DRF similarity
- Similarity that implements the
- http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/DFRSimilarity.html[divergence
- from randomness] framework. This similarity has the following options:
- [horizontal]
- `basic_model`::
- Possible values: `be`, `d`, `g`, `if`, `in`, `ine` and `p`.
- `after_effect`::
- Possible values: `no`, `b` and `l`.
- `normalization`::
- Possible values: `no`, `h1`, `h2`, `h3` and `z`.
- All options but the first option need a normalization value.
- Type name: `DFR`
- [float]
- ==== IB similarity.
- http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/search/similarities/IBSimilarity.html[Information
- based model] . This similarity has the following options:
- [horizontal]
- `distribution`:: Possible values: `ll` and `spl`.
- `lambda`:: Possible values: `df` and `ttf`.
- `normalization`:: Same as in `DFR` similarity.
- Type name: `IB`
- [float]
- ==== Default and Base Similarities
- By default, Elasticsearch will use whatever similarity is configured as
- `default`. However, the similarity functions `queryNorm()` and `coord()`
- are not per-field. Consequently, for expert users wanting to change the
- implementation used for these two methods, while not changing the
- `default`, it is possible to configure a similarity with the name
- `base`. This similarity will then be used for the two methods.
- You can change the default similarity for all fields like this:
- [source,js]
- --------------------------------------------------
- index.similarity.default.type: BM25
- --------------------------------------------------
|