normalizers.asciidoc 1.7 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
  1. [[analysis-normalizers]]
  2. == Normalizers
  3. Normalizers are similar to analyzers except that they may only emit a single
  4. token. As a consequence, they do not have a tokenizer and only accept a subset
  5. of the available char filters and token filters. Only the filters that work on
  6. a per-character basis are allowed. For instance a lowercasing filter would be
  7. allowed, but not a stemming filter, which needs to look at the keyword as a
  8. whole. The current list of filters that can be used in a normalizer is
  9. following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
  10. `cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
  11. `hindi_normalization`, `indic_normalization`, `lowercase`,
  12. `persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
  13. `sorani_normalization`, `uppercase`.
  14. Elasticsearch ships with a `lowercase` built-in normalizer. For other forms of
  15. normalization a custom configuration is required.
  16. [discrete]
  17. === Custom normalizers
  18. Custom normalizers take a list of
  19. <<analysis-charfilters, character filters>> and a list of
  20. <<analysis-tokenfilters,token filters>>.
  21. [source,console]
  22. --------------------------------
  23. PUT index
  24. {
  25. "settings": {
  26. "analysis": {
  27. "char_filter": {
  28. "quote": {
  29. "type": "mapping",
  30. "mappings": [
  31. "« => \"",
  32. "» => \""
  33. ]
  34. }
  35. },
  36. "normalizer": {
  37. "my_normalizer": {
  38. "type": "custom",
  39. "char_filter": ["quote"],
  40. "filter": ["lowercase", "asciifolding"]
  41. }
  42. }
  43. }
  44. },
  45. "mappings": {
  46. "properties": {
  47. "foo": {
  48. "type": "keyword",
  49. "normalizer": "my_normalizer"
  50. }
  51. }
  52. }
  53. }
  54. --------------------------------