letter-tokenizer.asciidoc 337 B

1234567
  1. [[analysis-letter-tokenizer]]
  2. === Letter Tokenizer
  3. A tokenizer of type `letter` that divides text at non-letters. That's to
  4. say, it defines tokens as maximal strings of adjacent letters. Note,
  5. this does a decent job for most European languages, but does a terrible
  6. job for some Asian languages, where words are not separated by spaces.