analysis-phonetic.asciidoc 3.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
  1. [[analysis-phonetic]]
  2. === Phonetic Analysis Plugin
  3. The Phonetic Analysis plugin provides token filters which convert tokens to
  4. their phonetic representation using Soundex, Metaphone, and a variety of other
  5. algorithms.
  6. [[analysis-phonetic-install]]
  7. [float]
  8. ==== Installation
  9. This plugin can be installed using the plugin manager:
  10. [source,sh]
  11. ----------------------------------------------------------------
  12. sudo bin/elasticsearch-plugin install analysis-phonetic
  13. ----------------------------------------------------------------
  14. The plugin must be installed on every node in the cluster, and each node must
  15. be restarted after installation.
  16. This plugin can be downloaded for offline install from
  17. {plugin_url}/analysis-phonetic/{version}/analysis-phonetic-{version}.zip[elastic download service].
  18. [[analysis-phonetic-remove]]
  19. [float]
  20. ==== Removal
  21. The plugin can be removed with the following command:
  22. [source,sh]
  23. ----------------------------------------------------------------
  24. sudo bin/elasticsearch-plugin remove analysis-phonetic
  25. ----------------------------------------------------------------
  26. The node must be stopped before removing the plugin.
  27. [[analysis-phonetic-token-filter]]
  28. ==== `phonetic` token filter
  29. The `phonetic` token filter takes the following settings:
  30. `encoder`::
  31. Which phonetic encoder to use. Accepts `metaphone` (default),
  32. `doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
  33. `caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
  34. `beidermorse`, `daitch_mokotoff`.
  35. `replace`::
  36. Whether or not the original token should be replaced by the phonetic
  37. token. Accepts `true` (default) and `false`. Not supported by
  38. `beidermorse` encoding.
  39. [source,js]
  40. --------------------------------------------------
  41. PUT phonetic_sample
  42. {
  43. "settings": {
  44. "index": {
  45. "analysis": {
  46. "analyzer": {
  47. "my_analyzer": {
  48. "tokenizer": "standard",
  49. "filter": [
  50. "standard",
  51. "lowercase",
  52. "my_metaphone"
  53. ]
  54. }
  55. },
  56. "filter": {
  57. "my_metaphone": {
  58. "type": "phonetic",
  59. "encoder": "metaphone",
  60. "replace": false
  61. }
  62. }
  63. }
  64. }
  65. }
  66. }
  67. POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs <1>
  68. --------------------------------------------------
  69. // CONSOLE
  70. <1> Returns: `J`, `joe`, `BLKS`, `bloggs`
  71. [float]
  72. ===== Double metaphone settings
  73. If the `double_metaphone` encoder is used, then this additional setting is
  74. supported:
  75. `max_code_len`::
  76. The maximum length of the emitted metaphone token. Defaults to `4`.
  77. [float]
  78. ===== Beider Morse settings
  79. If the `beider_morse` encoder is used, then these additional settings are
  80. supported:
  81. `rule_type`::
  82. Whether matching should be `exact` or `approx` (default).
  83. `name_type`::
  84. Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
  85. `languageset`::
  86. An array of languages to check. If not specified, then the language will
  87. be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
  88. `german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
  89. `spanish`.