analysis-phonetic.asciidoc 3.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
  1. [[analysis-phonetic]]
  2. === Phonetic analysis plugin
  3. The Phonetic Analysis plugin provides token filters which convert tokens to
  4. their phonetic representation using Soundex, Metaphone, and a variety of other
  5. algorithms.
  6. :plugin_name: analysis-phonetic
  7. include::install_remove.asciidoc[]
  8. [[analysis-phonetic-token-filter]]
  9. ==== `phonetic` token filter
  10. The `phonetic` token filter takes the following settings:
  11. `encoder`::
  12. Which phonetic encoder to use. Accepts `metaphone` (default),
  13. `double_metaphone`, `soundex`, `refined_soundex`, `caverphone1`,
  14. `caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
  15. `beider_morse`, `daitch_mokotoff`.
  16. `replace`::
  17. Whether or not the original token should be replaced by the phonetic
  18. token. Accepts `true` (default) and `false`. Not supported by
  19. `beider_morse` encoding.
  20. [source,console]
  21. --------------------------------------------------
  22. PUT phonetic_sample
  23. {
  24. "settings": {
  25. "index": {
  26. "analysis": {
  27. "analyzer": {
  28. "my_analyzer": {
  29. "tokenizer": "standard",
  30. "filter": [
  31. "lowercase",
  32. "my_metaphone"
  33. ]
  34. }
  35. },
  36. "filter": {
  37. "my_metaphone": {
  38. "type": "phonetic",
  39. "encoder": "metaphone",
  40. "replace": false
  41. }
  42. }
  43. }
  44. }
  45. }
  46. }
  47. GET phonetic_sample/_analyze
  48. {
  49. "analyzer": "my_analyzer",
  50. "text": "Joe Bloggs" <1>
  51. }
  52. --------------------------------------------------
  53. <1> Returns: `J`, `joe`, `BLKS`, `bloggs`
  54. It is important to note that `"replace": false` can lead to unexpected behavior since
  55. the original and the phonetically analyzed version are both kept at the same token position.
  56. Some queries handle these stacked tokens in special ways. For example, the fuzzy `match`
  57. query does not apply {ref}/common-options.html#fuzziness[fuzziness] to stacked synonym tokens.
  58. This can lead to issues that are difficult to diagnose and reason about. For this reason, it
  59. is often beneficial to use separate fields for analysis with and without phonetic filtering.
  60. That way searches can be run against both fields with differing boosts and trade-offs (e.g.
  61. only run a fuzzy `match` query on the original text field, but not on the phonetic version).
  62. [discrete]
  63. ===== Double metaphone settings
  64. If the `double_metaphone` encoder is used, then this additional setting is
  65. supported:
  66. `max_code_len`::
  67. The maximum length of the emitted metaphone token. Defaults to `4`.
  68. [discrete]
  69. ===== Beider Morse settings
  70. If the `beider_morse` encoder is used, then these additional settings are
  71. supported:
  72. `rule_type`::
  73. Whether matching should be `exact` or `approx` (default).
  74. `name_type`::
  75. Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
  76. `languageset`::
  77. An array of languages to check. If not specified, then the language will
  78. be guessed. Accepts: `any`, `common`, `cyrillic`, `english`, `french`,
  79. `german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
  80. `spanish`.