analyzers.asciidoc 2.3 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
  1. [[analysis-analyzers]]
  2. == Analyzers
  3. Elasticsearch ships with a wide range of built-in analyzers, which can be used
  4. in any index without further configuration:
  5. <<analysis-standard-analyzer,Standard Analyzer>>::
  6. The `standard` analyzer divides text into terms on word boundaries, as defined
  7. by the Unicode Text Segmentation algorithm. It removes most punctuation,
  8. lowercases terms, and supports removing stop words.
  9. <<analysis-simple-analyzer,Simple Analyzer>>::
  10. The `simple` analyzer divides text into terms whenever it encounters a
  11. character which is not a letter. It lowercases all terms.
  12. <<analysis-whitespace-analyzer,Whitespace Analyzer>>::
  13. The `whitespace` analyzer divides text into terms whenever it encounters any
  14. whitespace character. It does not lowercase terms.
  15. <<analysis-stop-analyzer,Stop Analyzer>>::
  16. The `stop` analyzer is like the `simple` analyzer, but also supports removal
  17. of stop words.
  18. <<analysis-keyword-analyzer,Keyword Analyzer>>::
  19. The `keyword` analyzer is a ``noop'' analyzer that accepts whatever text it is
  20. given and outputs the exact same text as a single term.
  21. <<analysis-pattern-analyzer,Pattern Analyzer>>::
  22. The `pattern` analyzer uses a regular expression to split the text into terms.
  23. It supports lower-casing and stop words.
  24. <<analysis-lang-analyzer,Language Analyzers>>::
  25. Elasticsearch provides many language-specific analyzers like `english` or
  26. `french`.
  27. <<analysis-fingerprint-analyzer,Fingerprint Analyzer>>::
  28. The `fingerprint` analyzer is a specialist analyzer which creates a
  29. fingerprint which can be used for duplicate detection.
  30. [float]
  31. === Custom analyzers
  32. If you do not find an analyzer suitable for your needs, you can create a
  33. <<analysis-custom-analyzer,`custom`>> analyzer which combines the appropriate
  34. <<analysis-charfilters, character filters>>,
  35. <<analysis-tokenizers,tokenizer>>, and <<analysis-tokenfilters,token filters>>.
  36. include::analyzers/configuring.asciidoc[]
  37. include::analyzers/standard-analyzer.asciidoc[]
  38. include::analyzers/simple-analyzer.asciidoc[]
  39. include::analyzers/whitespace-analyzer.asciidoc[]
  40. include::analyzers/stop-analyzer.asciidoc[]
  41. include::analyzers/keyword-analyzer.asciidoc[]
  42. include::analyzers/pattern-analyzer.asciidoc[]
  43. include::analyzers/lang-analyzer.asciidoc[]
  44. include::analyzers/fingerprint-analyzer.asciidoc[]
  45. include::analyzers/custom-analyzer.asciidoc[]