analyzers.asciidoc 2.2 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
  1. [[analysis-analyzers]]
  2. == Built-in analyzer reference
  3. Elasticsearch ships with a wide range of built-in analyzers, which can be used
  4. in any index without further configuration:
  5. <<analysis-standard-analyzer,Standard Analyzer>>::
  6. The `standard` analyzer divides text into terms on word boundaries, as defined
  7. by the Unicode Text Segmentation algorithm. It removes most punctuation,
  8. lowercases terms, and supports removing stop words.
  9. <<analysis-simple-analyzer,Simple Analyzer>>::
  10. The `simple` analyzer divides text into terms whenever it encounters a
  11. character which is not a letter. It lowercases all terms.
  12. <<analysis-whitespace-analyzer,Whitespace Analyzer>>::
  13. The `whitespace` analyzer divides text into terms whenever it encounters any
  14. whitespace character. It does not lowercase terms.
  15. <<analysis-stop-analyzer,Stop Analyzer>>::
  16. The `stop` analyzer is like the `simple` analyzer, but also supports removal
  17. of stop words.
  18. <<analysis-keyword-analyzer,Keyword Analyzer>>::
  19. The `keyword` analyzer is a ``noop'' analyzer that accepts whatever text it is
  20. given and outputs the exact same text as a single term.
  21. <<analysis-pattern-analyzer,Pattern Analyzer>>::
  22. The `pattern` analyzer uses a regular expression to split the text into terms.
  23. It supports lower-casing and stop words.
  24. <<analysis-lang-analyzer,Language Analyzers>>::
  25. Elasticsearch provides many language-specific analyzers like `english` or
  26. `french`.
  27. <<analysis-fingerprint-analyzer,Fingerprint Analyzer>>::
  28. The `fingerprint` analyzer is a specialist analyzer which creates a
  29. fingerprint which can be used for duplicate detection.
  30. [float]
  31. === Custom analyzers
  32. If you do not find an analyzer suitable for your needs, you can create a
  33. <<analysis-custom-analyzer,`custom`>> analyzer which combines the appropriate
  34. <<analysis-charfilters, character filters>>,
  35. <<analysis-tokenizers,tokenizer>>, and <<analysis-tokenfilters,token filters>>.
  36. include::analyzers/fingerprint-analyzer.asciidoc[]
  37. include::analyzers/keyword-analyzer.asciidoc[]
  38. include::analyzers/lang-analyzer.asciidoc[]
  39. include::analyzers/pattern-analyzer.asciidoc[]
  40. include::analyzers/simple-analyzer.asciidoc[]
  41. include::analyzers/standard-analyzer.asciidoc[]
  42. include::analyzers/stop-analyzer.asciidoc[]
  43. include::analyzers/whitespace-analyzer.asciidoc[]