123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121 |
- [[analysis-phonetic]]
- === Phonetic Analysis Plugin
- The Phonetic Analysis plugin provides token filters which convert tokens to
- their phonetic representation using Soundex, Metaphone, and a variety of other
- algorithms.
- [[analysis-phonetic-install]]
- [float]
- ==== Installation
- This plugin can be installed using the plugin manager:
- [source,sh]
- ----------------------------------------------------------------
- sudo bin/elasticsearch-plugin install analysis-phonetic
- ----------------------------------------------------------------
- The plugin must be installed on every node in the cluster, and each node must
- be restarted after installation.
- This plugin can be downloaded for offline install from
- {plugin_url}/analysis-phonetic/{version}/analysis-phonetic-{version}.zip[elastic download service].
- [[analysis-phonetic-remove]]
- [float]
- ==== Removal
- The plugin can be removed with the following command:
- [source,sh]
- ----------------------------------------------------------------
- sudo bin/elasticsearch-plugin remove analysis-phonetic
- ----------------------------------------------------------------
- The node must be stopped before removing the plugin.
- [[analysis-phonetic-token-filter]]
- ==== `phonetic` token filter
- The `phonetic` token filter takes the following settings:
- `encoder`::
- Which phonetic encoder to use. Accepts `metaphone` (default),
- `doublemetaphone`, `soundex`, `refinedsoundex`, `caverphone1`,
- `caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
- `beidermorse`, `daitch_mokotoff`.
- `replace`::
- Whether or not the original token should be replaced by the phonetic
- token. Accepts `true` (default) and `false`. Not supported by
- `beidermorse` encoding.
- [source,js]
- --------------------------------------------------
- PUT phonetic_sample
- {
- "settings": {
- "index": {
- "analysis": {
- "analyzer": {
- "my_analyzer": {
- "tokenizer": "standard",
- "filter": [
- "standard",
- "lowercase",
- "my_metaphone"
- ]
- }
- },
- "filter": {
- "my_metaphone": {
- "type": "phonetic",
- "encoder": "metaphone",
- "replace": false
- }
- }
- }
- }
- }
- }
- POST phonetic_sample/_analyze?analyzer=my_analyzer&text=Joe Bloggs <1>
- --------------------------------------------------
- // CONSOLE
- <1> Returns: `J`, `joe`, `BLKS`, `bloggs`
- [float]
- ===== Double metaphone settings
- If the `double_metaphone` encoder is used, then this additional setting is
- supported:
- `max_code_len`::
- The maximum length of the emitted metaphone token. Defaults to `4`.
- [float]
- ===== Beider Morse settings
- If the `beider_morse` encoder is used, then these additional settings are
- supported:
- `rule_type`::
- Whether matching should be `exact` or `approx` (default).
- `name_type`::
- Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
- `languageset`::
- An array of languages to check. If not specified, then the language will
- be guessed. Accepts: `any`, `comomon`, `cyrillic`, `english`, `french`,
- `german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
- `spanish`.
|