analysis-phonetic.asciidoc 3.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
  1. [[analysis-phonetic]]
  2. === Phonetic Analysis Plugin
  3. The Phonetic Analysis plugin provides token filters which convert tokens to
  4. their phonetic representation using Soundex, Metaphone, and a variety of other
  5. algorithms.
  6. [[analysis-phonetic-install]]
  7. [float]
  8. ==== Installation
  9. This plugin can be installed using the plugin manager:
  10. [source,sh]
  11. ----------------------------------------------------------------
  12. sudo bin/plugin install analysis-phonetic
  13. ----------------------------------------------------------------
  14. The plugin must be installed on every node in the cluster, and each node must
  15. be restarted after installation.
  16. [[analysis-phonetic-remove]]
  17. [float]
  18. ==== Removal
  19. The plugin can be removed with the following command:
  20. [source,sh]
  21. ----------------------------------------------------------------
  22. sudo bin/plugin remove analysis-phonetic
  23. ----------------------------------------------------------------
  24. The node must be stopped before removing the plugin.
  25. [[analysis-phonetic-token-filter]]
  26. ==== `phonetic` token filter
  27. The `phonetic` token filter takes the following settings:
  28. `encoder`::
  29. Which phonetic encoder to use. Accepts `metaphone` (default),
  30. `doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
  31. `caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
  32. `beidermorse`.
  33. `replace`::
  34. Whether or not the original token should be replaced by the phonetic
  35. token. Accepts `true` (default) and `false`. Not supported by
  36. `beidermorse` encoding.
  37. [source,json]
  38. --------------------------------------------------
  39. PUT phonetic_sample
  40. {
  41. "settings": {
  42. "index": {
  43. "analysis": {
  44. "analyzer": {
  45. "my_analyzer": {
  46. "tokenizer": "standard",
  47. "filter": [
  48. "standard",
  49. "lowercase",
  50. "my_metaphone"
  51. ]
  52. }
  53. },
  54. "filter": {
  55. "my_metaphone": {
  56. "type": "phonetic",
  57. "encoder": "metaphone",
  58. "replace": false
  59. }
  60. }
  61. }
  62. }
  63. }
  64. }
  65. POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs <1>
  66. --------------------------------------------------
  67. // AUTOSENSE
  68. <1> Returns: `J`, `joe`, `BLKS`, `bloggs`
  69. [float]
  70. ===== Double metaphone settings
  71. If the `double_metaphone` encoder is used, then this additional setting is
  72. supported:
  73. `max_code_len`::
  74. The maximum length of the emitted metaphone token. Defaults to `4`.
  75. [float]
  76. ===== Beider Morse settings
  77. If the `beider_morse` encoder is used, then these additional settings are
  78. supported:
  79. `rule_type`::
  80. Whether matching should be `exact` or `approx` (default).
  81. `name_type`::
  82. Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
  83. `languageset`::
  84. An array of languages to check. If not specified, then the language will
  85. be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
  86. `german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
  87. `spanish`.