flt-query.asciidoc 2.4 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
  1. [[query-dsl-flt-query]]
  2. === Fuzzy Like This Query
  3. Fuzzy like this query find documents that are "like" provided text by
  4. running it against one or more fields.
  5. [source,js]
  6. --------------------------------------------------
  7. {
  8. "fuzzy_like_this" : {
  9. "fields" : ["name.first", "name.last"],
  10. "like_text" : "text like this one",
  11. "max_query_terms" : 12
  12. }
  13. }
  14. --------------------------------------------------
  15. `fuzzy_like_this` can be shortened to `flt`.
  16. The `fuzzy_like_this` top level parameters include:
  17. [cols="<,<",options="header",]
  18. |=======================================================================
  19. |Parameter |Description
  20. |`fields` |A list of the fields to run the more like this query against.
  21. Defaults to the `_all` field.
  22. |`like_text` |The text to find documents like it, *required*.
  23. |`ignore_tf` |Should term frequency be ignored. Defaults to `false`.
  24. |`max_query_terms` |The maximum number of query terms that will be
  25. included in any generated query. Defaults to `25`.
  26. |`fuzziness` |The minimum similarity of the term variants. Defaults
  27. to `0.5`. See <<fuzziness>>.
  28. |`prefix_length` |Length of required common prefix on variant terms.
  29. Defaults to `0`.
  30. |`boost` |Sets the boost value of the query. Defaults to `1.0`.
  31. |`analyzer` |The analyzer that will be used to analyze the text.
  32. Defaults to the analyzer associated with the field.
  33. |=======================================================================
  34. [float]
  35. ==== How it Works
  36. Fuzzifies ALL terms provided as strings and then picks the best n
  37. differentiating terms. In effect this mixes the behaviour of FuzzyQuery
  38. and MoreLikeThis but with special consideration of fuzzy scoring
  39. factors. This generally produces good results for queries where users
  40. may provide details in a number of fields and have no knowledge of
  41. boolean query syntax and also want a degree of fuzzy matching and a fast
  42. query.
  43. For each source term the fuzzy variants are held in a BooleanQuery with
  44. no coord factor (because we are not looking for matches on multiple
  45. variants in any one doc). Additionally, a specialized TermQuery is used
  46. for variants and does not use that variant term's IDF because this would
  47. favor rarer terms, such as misspellings. Instead, all variants use the
  48. same IDF ranking (the one for the source query term) and this is
  49. factored into the variant's boost. If the source query term does not
  50. exist in the index the average IDF of the variants is used.