standard-tokenizer.asciidoc 738 B

123456789101112131415161718
  1. [[analysis-standard-tokenizer]]
  2. === Standard Tokenizer
  3. A tokenizer of type `standard` providing grammar based tokenizer that is
  4. a good tokenizer for most European language documents. The tokenizer
  5. implements the Unicode Text Segmentation algorithm, as specified in
  6. http://unicode.org/reports/tr29/[Unicode Standard Annex #29].
  7. The following are settings that can be set for a `standard` tokenizer
  8. type:
  9. [cols="<,<",options="header",]
  10. |=======================================================================
  11. |Setting |Description
  12. |`max_token_length` |The maximum token length. If a token is seen that
  13. exceeds this length then it is discarded. Defaults to `255`.
  14. |=======================================================================