analysis-phonetic.asciidoc 2.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
  1. [[analysis-phonetic]]
  2. === Phonetic Analysis Plugin
  3. The Phonetic Analysis plugin provides token filters which convert tokens to
  4. their phonetic representation using Soundex, Metaphone, and a variety of other
  5. algorithms.
  6. :plugin_name: analysis-phonetic
  7. include::install_remove.asciidoc[]
  8. [[analysis-phonetic-token-filter]]
  9. ==== `phonetic` token filter
  10. The `phonetic` token filter takes the following settings:
  11. `encoder`::
  12. Which phonetic encoder to use. Accepts `metaphone` (default),
  13. `doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
  14. `caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
  15. `beidermorse`, `daitch_mokotoff`.
  16. `replace`::
  17. Whether or not the original token should be replaced by the phonetic
  18. token. Accepts `true` (default) and `false`. Not supported by
  19. `beidermorse` encoding.
  20. [source,js]
  21. --------------------------------------------------
  22. PUT phonetic_sample
  23. {
  24. "settings": {
  25. "index": {
  26. "analysis": {
  27. "analyzer": {
  28. "my_analyzer": {
  29. "tokenizer": "standard",
  30. "filter": [
  31. "standard",
  32. "lowercase",
  33. "my_metaphone"
  34. ]
  35. }
  36. },
  37. "filter": {
  38. "my_metaphone": {
  39. "type": "phonetic",
  40. "encoder": "metaphone",
  41. "replace": false
  42. }
  43. }
  44. }
  45. }
  46. }
  47. }
  48. GET phonetic_sample/_analyze
  49. {
  50. "analyzer": "my_analyzer",
  51. "text": "Joe Bloggs" <1>
  52. }
  53. --------------------------------------------------
  54. // CONSOLE
  55. <1> Returns: `J`, `joe`, `BLKS`, `bloggs`
  56. [float]
  57. ===== Double metaphone settings
  58. If the `double_metaphone` encoder is used, then this additional setting is
  59. supported:
  60. `max_code_len`::
  61. The maximum length of the emitted metaphone token. Defaults to `4`.
  62. [float]
  63. ===== Beider Morse settings
  64. If the `beider_morse` encoder is used, then these additional settings are
  65. supported:
  66. `rule_type`::
  67. Whether matching should be `exact` or `approx` (default).
  68. `name_type`::
  69. Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
  70. `languageset`::
  71. An array of languages to check. If not specified, then the language will
  72. be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
  73. `german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
  74. `spanish`.