123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105 |
- [[analysis-phonetic]]
- === Phonetic analysis plugin
- The Phonetic Analysis plugin provides token filters which convert tokens to
- their phonetic representation using Soundex, Metaphone, and a variety of other
- algorithms.
- :plugin_name: analysis-phonetic
- include::install_remove.asciidoc[]
- [[analysis-phonetic-token-filter]]
- ==== `phonetic` token filter
- The `phonetic` token filter takes the following settings:
- `encoder`::
- Which phonetic encoder to use. Accepts `metaphone` (default),
- `double_metaphone`, `soundex`, `refined_soundex`, `caverphone1`,
- `caverphone2`, `cologne`, `nysiis`, `koelnerphonetik`, `haasephonetik`,
- `beider_morse`, `daitch_mokotoff`.
- `replace`::
- Whether or not the original token should be replaced by the phonetic
- token. Accepts `true` (default) and `false`. Not supported by
- `beider_morse` encoding.
- [source,console]
- --------------------------------------------------
- PUT phonetic_sample
- {
- "settings": {
- "index": {
- "analysis": {
- "analyzer": {
- "my_analyzer": {
- "tokenizer": "standard",
- "filter": [
- "lowercase",
- "my_metaphone"
- ]
- }
- },
- "filter": {
- "my_metaphone": {
- "type": "phonetic",
- "encoder": "metaphone",
- "replace": false
- }
- }
- }
- }
- }
- }
- GET phonetic_sample/_analyze
- {
- "analyzer": "my_analyzer",
- "text": "Joe Bloggs" <1>
- }
- --------------------------------------------------
- <1> Returns: `J`, `joe`, `BLKS`, `bloggs`
- It is important to note that `"replace": false` can lead to unexpected behavior since
- the original and the phonetically analyzed version are both kept at the same token position.
- Some queries handle these stacked tokens in special ways. For example, the fuzzy `match`
- query does not apply {ref}/common-options.html#fuzziness[fuzziness] to stacked synonym tokens.
- This can lead to issues that are difficult to diagnose and reason about. For this reason, it
- is often beneficial to use separate fields for analysis with and without phonetic filtering.
- That way searches can be run against both fields with differing boosts and trade-offs (e.g.
- only run a fuzzy `match` query on the original text field, but not on the phonetic version).
- [discrete]
- ===== Double metaphone settings
- If the `double_metaphone` encoder is used, then this additional setting is
- supported:
- `max_code_len`::
- The maximum length of the emitted metaphone token. Defaults to `4`.
- [discrete]
- ===== Beider Morse settings
- If the `beider_morse` encoder is used, then these additional settings are
- supported:
- `rule_type`::
- Whether matching should be `exact` or `approx` (default).
- `name_type`::
- Whether names are `ashkenazi`, `sephardic`, or `generic` (default).
- `languageset`::
- An array of languages to check. If not specified, then the language will
- be guessed. Accepts: `any`, `common`, `cyrillic`, `english`, `french`,
- `german`, `hebrew`, `hungarian`, `polish`, `romanian`, `russian`,
- `spanish`.
|