123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960 |
- [[analysis-normalizers]]
- == Normalizers
- Normalizers are similar to analyzers except that they may only emit a single
- token. As a consequence, they do not have a tokenizer and only accept a subset
- of the available char filters and token filters. Only the filters that work on
- a per-character basis are allowed. For instance a lowercasing filter would be
- allowed, but not a stemming filter, which needs to look at the keyword as a
- whole. The current list of filters that can be used in a normalizer is
- following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
- `cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
- `hindi_normalization`, `indic_normalization`, `lowercase`,
- `persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
- `sorani_normalization`, `uppercase`.
- [float]
- === Custom normalizers
- Elasticsearch does not ship with built-in normalizers so far, so the only way
- to get one is by building a custom one. Custom normalizers take a list of char
- <<analysis-charfilters, character filters>> and a list of
- <<analysis-tokenfilters,token filters>>.
- [source,js]
- --------------------------------
- PUT index
- {
- "settings": {
- "analysis": {
- "char_filter": {
- "quote": {
- "type": "mapping",
- "mappings": [
- "« => \"",
- "» => \""
- ]
- }
- },
- "normalizer": {
- "my_normalizer": {
- "type": "custom",
- "char_filter": ["quote"],
- "filter": ["lowercase", "asciifolding"]
- }
- }
- }
- },
- "mappings": {
- "_doc": {
- "properties": {
- "foo": {
- "type": "keyword",
- "normalizer": "my_normalizer"
- }
- }
- }
- }
- }
- --------------------------------
- // CONSOLE
|