analysis-phonetic.asciidoc 3.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
  1. [[analysis-phonetic]]
  2. === Phonetic Analysis Plugin
  3. The Phonetic Analysis plugin provides token filters which convert tokens to
  4. their phonetic representation using Soundex, Metaphone, and a variety of other
  5. algorithms.
  6. [[analysis-phonetic-install]]
  7. [float]
  8. ==== Installation
  9. This plugin can be installed using the plugin manager:
  10. [source,sh]
  11. ----------------------------------------------------------------
  12. sudo bin/elasticsearch-plugin install analysis-phonetic
  13. ----------------------------------------------------------------
  14. The plugin must be installed on every node in the cluster, and each node must
  15. be restarted after installation.
  16. This plugin can be downloaded for <<plugin-management-custom-url,offline install>> from
  17. {plugin_url}/analysis-phonetic/analysis-phonetic-{version}.zip.
  18. [[analysis-phonetic-remove]]
  19. [float]
  20. ==== Removal
  21. The plugin can be removed with the following command:
  22. [source,sh]
  23. ----------------------------------------------------------------
  24. sudo bin/elasticsearch-plugin remove analysis-phonetic
  25. ----------------------------------------------------------------
  26. The node must be stopped before removing the plugin.
  27. [[analysis-phonetic-token-filter]]
  28. ==== `phonetic` token filter
  29. The `phonetic` token filter takes the following settings:
  30. `encoder`::
  31. Which phonetic encoder to use. Accepts `metaphone` (default),
  32. `doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
  33. `caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
  34. `beidermorse`, `daitch_mokotoff`.
  35. `replace`::
  36. Whether or not the original token should be replaced by the phonetic
  37. token. Accepts `true` (default) and `false`. Not supported by
  38. `beidermorse` encoding.
  39. [source,js]
  40. --------------------------------------------------
  41. PUT phonetic_sample
  42. {
  43. "settings": {
  44. "index": {
  45. "analysis": {
  46. "analyzer": {
  47. "my_analyzer": {
  48. "tokenizer": "standard",
  49. "filter": [
  50. "standard",
  51. "lowercase",
  52. "my_metaphone"
  53. ]
  54. }
  55. },
  56. "filter": {
  57. "my_metaphone": {
  58. "type": "phonetic",
  59. "encoder": "metaphone",
  60. "replace": false
  61. }
  62. }
  63. }
  64. }
  65. }
  66. }
  67. POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs <1>
  68. --------------------------------------------------
  69. // CONSOLE
  70. <1> Returns: `J`, `joe`, `BLKS`, `bloggs`
  71. [float]
  72. ===== Double metaphone settings
  73. If the `double_metaphone` encoder is used, then this additional setting is
  74. supported:
  75. `max_code_len`::
  76. The maximum length of the emitted metaphone token. Defaults to `4`.
  77. [float]
  78. ===== Beider Morse settings
  79. If the `beider_morse` encoder is used, then these additional settings are
  80. supported:
  81. `rule_type`::
  82. Whether matching should be `exact` or `approx` (default).
  83. `name_type`::
  84. Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
  85. `languageset`::
  86. An array of languages to check. If not specified, then the language will
  87. be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
  88. `german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
  89. `spanish`.