charfilters.asciidoc 1.4 KB

123456789101112131415161718192021222324252627282930313233343536
  1. [[analysis-charfilters]]
  2. == Character Filters
  3. _Character filters_ are used to preprocess the stream of characters before it
  4. is passed to the <<analysis-tokenizers,tokenizer>>.
  5. A character filter receives the original text as a stream of characters and
  6. can transform the stream by adding, removing, or changing characters. For
  7. instance, a character filter could be used to convert Hindu-Arabic numerals
  8. (٠‎١٢٣٤٥٦٧٨‎٩‎) into their Arabic-Latin equivalents (0123456789), or to strip HTML
  9. elements like `<b>` from the stream.
  10. Elasticsearch has a number of built in character filters which can be used to build
  11. <<analysis-custom-analyzer,custom analyzers>>.
  12. <<analysis-htmlstrip-charfilter,HTML Strip Character Filter>>::
  13. The `html_strip` character filter strips out HTML elements like `<b>` and
  14. decodes HTML entities like `&amp;`.
  15. <<analysis-mapping-charfilter,Mapping Character Filter>>::
  16. The `mapping` character filter replaces any occurrences of the specified
  17. strings with the specified replacements.
  18. <<analysis-pattern-replace-charfilter,Pattern Replace Character Filter>>::
  19. The `pattern_replace` character filter replaces any characters matching a
  20. regular expression with the specified replacement.
  21. include::charfilters/htmlstrip-charfilter.asciidoc[]
  22. include::charfilters/mapping-charfilter.asciidoc[]
  23. include::charfilters/pattern-replace-charfilter.asciidoc[]