1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465 |
- [[query-dsl-flt-query]]
- === Fuzzy Like This Query
- Fuzzy like this query find documents that are "like" provided text by
- running it against one or more fields.
- [source,js]
- --------------------------------------------------
- {
- "fuzzy_like_this" : {
- "fields" : ["name.first", "name.last"],
- "like_text" : "text like this one",
- "max_query_terms" : 12
- }
- }
- --------------------------------------------------
- `fuzzy_like_this` can be shortened to `flt`.
- The `fuzzy_like_this` top level parameters include:
- [cols="<,<",options="header",]
- |=======================================================================
- |Parameter |Description
- |`fields` |A list of the fields to run the more like this query against.
- Defaults to the `_all` field.
- |`like_text` |The text to find documents like it, *required*.
- |`ignore_tf` |Should term frequency be ignored. Defaults to `false`.
- |`max_query_terms` |The maximum number of query terms that will be
- included in any generated query. Defaults to `25`.
- |`fuzziness` |The minimum similarity of the term variants. Defaults
- to `0.5`. See <<fuzziness>>.
- |`prefix_length` |Length of required common prefix on variant terms.
- Defaults to `0`.
- |`boost` |Sets the boost value of the query. Defaults to `1.0`.
- |`analyzer` |The analyzer that will be used to analyze the text.
- Defaults to the analyzer associated with the field.
- |=======================================================================
- [float]
- ==== How it Works
- Fuzzifies ALL terms provided as strings and then picks the best n
- differentiating terms. In effect this mixes the behaviour of FuzzyQuery
- and MoreLikeThis but with special consideration of fuzzy scoring
- factors. This generally produces good results for queries where users
- may provide details in a number of fields and have no knowledge of
- boolean query syntax and also want a degree of fuzzy matching and a fast
- query.
- For each source term the fuzzy variants are held in a BooleanQuery with
- no coord factor (because we are not looking for matches on multiple
- variants in any one doc). Additionally, a specialized TermQuery is used
- for variants and does not use that variant term's IDF because this would
- favor rarer terms, such as misspellings. Instead, all variants use the
- same IDF ranking (the one for the source query term) and this is
- factored into the variant's boost. If the source query term does not
- exist in the index the average IDF of the variants is used.
|