normalizers.asciidoc 1.7 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
  1. [[analysis-normalizers]]
  2. == Normalizers
  3. beta[]
  4. Normalizers are similar to analyzers except that they may only emit a single
  5. token. As a consequence, they do not have a tokenizer and only accept a subset
  6. of the available char filters and token filters. Only the filters that work on
  7. a per-character basis are allowed. For instance a lowercasing filter would be
  8. allowed, but not a stemming filter, which needs to look at the keyword as a
  9. whole. The current list of filters that can be used in a normalizer is
  10. following: `arabic_normalization`, `asciifolding`, `bengali_normalization`,
  11. `cjk_width`, `decimal_digit`, `elision`, `german_normalization`,
  12. `hindi_normalization`, `indic_normalization`, `lowercase`,
  13. `persian_normalization`, `scandinavian_folding`, `serbian_normalization`,
  14. `sorani_normalization`, `uppercase`.
  15. [float]
  16. === Custom normalizers
  17. Elasticsearch does not ship with built-in normalizers so far, so the only way
  18. to get one is by building a custom one. Custom normalizers take a list of char
  19. <<analysis-charfilters, character filters>> and a list of
  20. <<analysis-tokenfilters,token filters>>.
  21. [source,js]
  22. --------------------------------
  23. PUT index
  24. {
  25. "settings": {
  26. "analysis": {
  27. "char_filter": {
  28. "quote": {
  29. "type": "mapping",
  30. "mappings": [
  31. "« => \"",
  32. "» => \""
  33. ]
  34. }
  35. },
  36. "normalizer": {
  37. "my_normalizer": {
  38. "type": "custom",
  39. "char_filter": ["quote"],
  40. "filter": ["lowercase", "asciifolding"]
  41. }
  42. }
  43. }
  44. },
  45. "mappings": {
  46. "type": {
  47. "properties": {
  48. "foo": {
  49. "type": "keyword",
  50. "normalizer": "my_normalizer"
  51. }
  52. }
  53. }
  54. }
  55. }
  56. --------------------------------
  57. // CONSOLE