|
|
@@ -9,70 +9,70 @@ suggest text is analyzed before terms are suggested. The suggested terms
|
|
|
are provided per analyzed suggest text token. The `term` suggester
|
|
|
doesn't take the query into account that is part of request.
|
|
|
|
|
|
-==== Common suggest options:
|
|
|
+==== Common suggest options:
|
|
|
|
|
|
[horizontal]
|
|
|
-`text`::
|
|
|
+`text`::
|
|
|
The suggest text. The suggest text is a required option that
|
|
|
needs to be set globally or per suggestion.
|
|
|
|
|
|
-`field`::
|
|
|
+`field`::
|
|
|
The field to fetch the candidate suggestions from. This is
|
|
|
an required option that either needs to be set globally or per
|
|
|
- suggestion.
|
|
|
+ suggestion.
|
|
|
|
|
|
-`analyzer`::
|
|
|
+`analyzer`::
|
|
|
The analyzer to analyse the suggest text with. Defaults
|
|
|
- to the search analyzer of the suggest field.
|
|
|
+ to the search analyzer of the suggest field.
|
|
|
|
|
|
-`size`::
|
|
|
+`size`::
|
|
|
The maximum corrections to be returned per suggest text
|
|
|
- token.
|
|
|
+ token.
|
|
|
|
|
|
-`sort`::
|
|
|
+`sort`::
|
|
|
Defines how suggestions should be sorted per suggest text
|
|
|
term. Two possible values:
|
|
|
+
|
|
|
- ** `score`: Sort by score first, then document frequency and
|
|
|
- then the term itself.
|
|
|
+ ** `score`: Sort by score first, then document frequency and
|
|
|
+ then the term itself.
|
|
|
** `frequency`: Sort by document frequency first, then similarity
|
|
|
- score and then the term itself.
|
|
|
+ score and then the term itself.
|
|
|
+
|
|
|
-`suggest_mode`::
|
|
|
+`suggest_mode`::
|
|
|
The suggest mode controls what suggestions are
|
|
|
included or controls for what suggest text terms, suggestions should be
|
|
|
- suggested. Three possible values can be specified:
|
|
|
-+
|
|
|
+ suggested. Three possible values can be specified:
|
|
|
++
|
|
|
** `missing`: Only provide suggestions for suggest text terms that are
|
|
|
- not in the index. This is the default.
|
|
|
+ not in the index. This is the default.
|
|
|
** `popular`: Only suggest suggestions that occur in more docs then
|
|
|
- the original suggest text term.
|
|
|
+ the original suggest text term.
|
|
|
** `always`: Suggest any matching suggestions based on terms in the
|
|
|
suggest text.
|
|
|
|
|
|
-==== Other term suggest options:
|
|
|
+==== Other term suggest options:
|
|
|
|
|
|
[horizontal]
|
|
|
-`lowercase_terms`::
|
|
|
- Lower cases the suggest text terms after text analysis.
|
|
|
+`lowercase_terms`::
|
|
|
+ Lower cases the suggest text terms after text analysis.
|
|
|
|
|
|
-`max_edits`::
|
|
|
+`max_edits`::
|
|
|
The maximum edit distance candidate suggestions can
|
|
|
have in order to be considered as a suggestion. Can only be a value
|
|
|
between 1 and 2. Any other value result in an bad request error being
|
|
|
- thrown. Defaults to 2.
|
|
|
+ thrown. Defaults to 2.
|
|
|
|
|
|
-`prefix_length`::
|
|
|
+`prefix_length`::
|
|
|
The number of minimal prefix characters that must
|
|
|
match in order be a candidate suggestions. Defaults to 1. Increasing
|
|
|
this number improves spellcheck performance. Usually misspellings don't
|
|
|
- occur in the beginning of terms. (Old name "prefix_len" is deprecated)
|
|
|
+ occur in the beginning of terms. (Old name "prefix_len" is deprecated)
|
|
|
|
|
|
-`min_word_length`::
|
|
|
+`min_word_length`::
|
|
|
The minimum length a suggest text term must have in
|
|
|
order to be included. Defaults to 4. (Old name "min_word_len" is deprecated)
|
|
|
|
|
|
-`shard_size`::
|
|
|
+`shard_size`::
|
|
|
Sets the maximum number of suggestions to be retrieved
|
|
|
from each individual shard. During the reduce phase only the top N
|
|
|
suggestions are returned based on the `size` option. Defaults to the
|
|
|
@@ -81,24 +81,24 @@ doesn't take the query into account that is part of request.
|
|
|
corrections at the cost of performance. Due to the fact that terms are
|
|
|
partitioned amongst shards, the shard level document frequencies of
|
|
|
spelling corrections may not be precise. Increasing this will make these
|
|
|
- document frequencies more precise.
|
|
|
+ document frequencies more precise.
|
|
|
|
|
|
-`max_inspections`::
|
|
|
+`max_inspections`::
|
|
|
A factor that is used to multiply with the
|
|
|
`shards_size` in order to inspect more candidate spell corrections on
|
|
|
the shard level. Can improve accuracy at the cost of performance.
|
|
|
- Defaults to 5.
|
|
|
+ Defaults to 5.
|
|
|
|
|
|
-`min_doc_freq`::
|
|
|
+`min_doc_freq`::
|
|
|
The minimal threshold in number of documents a
|
|
|
suggestion should appear in. This can be specified as an absolute number
|
|
|
or as a relative percentage of number of documents. This can improve
|
|
|
quality by only suggesting high frequency terms. Defaults to 0f and is
|
|
|
not enabled. If a value higher than 1 is specified then the number
|
|
|
cannot be fractional. The shard level document frequencies are used for
|
|
|
- this option.
|
|
|
+ this option.
|
|
|
|
|
|
-`max_term_freq`::
|
|
|
+`max_term_freq`::
|
|
|
The maximum threshold in number of documents a
|
|
|
suggest text token can exist in order to be included. Can be a relative
|
|
|
percentage number (e.g 0.4) or an absolute number to represent document
|
|
|
@@ -108,3 +108,15 @@ doesn't take the query into account that is part of request.
|
|
|
usually spelled correctly on top of this also improves the spellcheck
|
|
|
performance. The shard level document frequencies are used for this
|
|
|
option.
|
|
|
+
|
|
|
+`string_distance`::
|
|
|
+ Which string distance implementation to use for comparing how similar
|
|
|
+ suggested terms are. Five possible values can be specfied:
|
|
|
+ `internal` - The default based on damerau_levenshtein but highly optimized
|
|
|
+ for comparing string distancee for terms inside the index.
|
|
|
+ `damerau_levenshtein` - String distance algorithm based on
|
|
|
+ Damerau-Levenshtein algorithm.
|
|
|
+ `levenstein` - String distance algorithm based on Levenstein edit distance
|
|
|
+ algorithm.
|
|
|
+ `jarowinkler` - String distance algorithm based on Jaro-Winkler algorithm.
|
|
|
+ `ngram` - String distance algorithm based on character n-grams.
|