analysis-phonetic.asciidoc 2.3 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
  1. [[analysis-phonetic]]
  2. === Phonetic Analysis Plugin
  3. The Phonetic Analysis plugin provides token filters which convert tokens to
  4. their phonetic representation using Soundex, Metaphone, and a variety of other
  5. algorithms.
  6. :plugin_name: analysis-phonetic
  7. include::install_remove.asciidoc[]
  8. [[analysis-phonetic-token-filter]]
  9. ==== `phonetic` token filter
  10. The `phonetic` token filter takes the following settings:
  11. `encoder`::
  12. Which phonetic encoder to use. Accepts `metaphone` (default),
  13. `double_metaphone`, `soundex`, `refined_soundex`, `caverphone1`,
  14. `caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
  15. `beider_morse`, `daitch_mokotoff`.
  16. `replace`::
  17. Whether or not the original token should be replaced by the phonetic
  18. token. Accepts `true` (default) and `false`. Not supported by
  19. `beider_morse` encoding.
  20. [source,js]
  21. --------------------------------------------------
  22. PUT phonetic_sample
  23. {
  24. "settings": {
  25. "index": {
  26. "analysis": {
  27. "analyzer": {
  28. "my_analyzer": {
  29. "tokenizer": "standard",
  30. "filter": [
  31. "lowercase",
  32. "my_metaphone"
  33. ]
  34. }
  35. },
  36. "filter": {
  37. "my_metaphone": {
  38. "type": "phonetic",
  39. "encoder": "metaphone",
  40. "replace": false
  41. }
  42. }
  43. }
  44. }
  45. }
  46. }
  47. GET phonetic_sample/_analyze
  48. {
  49. "analyzer": "my_analyzer",
  50. "text": "Joe Bloggs" <1>
  51. }
  52. --------------------------------------------------
  53. // CONSOLE
  54. <1> Returns: `J`, `joe`, `BLKS`, `bloggs`
  55. [float]
  56. ===== Double metaphone settings
  57. If the `double_metaphone` encoder is used, then this additional setting is
  58. supported:
  59. `max_code_len`::
  60. The maximum length of the emitted metaphone token. Defaults to `4`.
  61. [float]
  62. ===== Beider Morse settings
  63. If the `beider_morse` encoder is used, then these additional settings are
  64. supported:
  65. `rule_type`::
  66. Whether matching should be `exact` or `approx` (default).
  67. `name_type`::
  68. Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
  69. `languageset`::
  70. An array of languages to check. If not specified, then the language will
  71. be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
  72. `german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
  73. `spanish`.