123456789101112131415161718192021222324252627282930313233343536 |
- [[analysis-charfilters]]
- == Character Filters
- _Character filters_ are used to preprocess the stream of characters before it
- is passed to the <<analysis-tokenizers,tokenizer>>.
- A character filter receives the original text as a stream of characters and
- can transform the stream by adding, removing, or changing characters. For
- instance, a character filter could be used to convert Hindu-Arabic numerals
- (٠١٢٣٤٥٦٧٨٩) into their Arabic-Latin equivalents (0123456789), or to strip HTML
- elements like `<b>` from the stream.
- Elasticsearch has a number of built in character filters which can be used to build
- <<analysis-custom-analyzer,custom analyzers>>.
- <<analysis-htmlstrip-charfilter,HTML Strip Character Filter>>::
- The `html_strip` character filter strips out HTML elements like `<b>` and
- decodes HTML entities like `&`.
- <<analysis-mapping-charfilter,Mapping Character Filter>>::
- The `mapping` character filter replaces any occurrences of the specified
- strings with the specified replacements.
- <<analysis-pattern-replace-charfilter,Pattern Replace Character Filter>>::
- The `pattern_replace` character filter replaces any characters matching a
- regular expression with the specified replacement.
- include::charfilters/htmlstrip-charfilter.asciidoc[]
- include::charfilters/mapping-charfilter.asciidoc[]
- include::charfilters/pattern-replace-charfilter.asciidoc[]
|