1234567 |
- [[analysis-letter-tokenizer]]
- === Letter Tokenizer
- A tokenizer of type `letter` that divides text at non-letters. That's to
- say, it defines tokens as maximal strings of adjacent letters. Note,
- this does a decent job for most European languages, but does a terrible
- job for some Asian languages, where words are not separated by spaces.
|