classic-tokenizer.asciidoc 837 B

123456789101112131415161718192021
  1. [[analysis-classic-tokenizer]]
  2. === Classic Tokenizer
  3. added[1.3.0]
  4. A tokenizer of type `classic` providing grammar based tokenizer that is
  5. a good tokenizer for English language documents. This tokenizer has
  6. heuristics for special treatment of acronyms, company names, email addresses,
  7. and internet host names. However, these rules don't always work, and
  8. the tokenizer doesn't work well for most languages other than English.
  9. The following are settings that can be set for a `classic` tokenizer
  10. type:
  11. [cols="<,<",options="header",]
  12. |=======================================================================
  13. |Setting |Description
  14. |`max_token_length` |The maximum token length. If a token is seen that
  15. exceeds this length then it is discarded. Defaults to `255`.
  16. |=======================================================================