| 1234567891011121314151617181920212223242526272829303132333435363738 | [[analysis-pattern-tokenizer]]=== Pattern TokenizerA tokenizer of type `pattern` that can flexibly separate text into termsvia a regular expression. Accepts the following settings:[cols="<,<",options="header",]|======================================================================|Setting |Description|`pattern` |The regular expression pattern, defaults to `\W+`.|`flags` |The regular expression flags.|`group` |Which group to extract into tokens. Defaults to `-1` (split).|======================================================================*IMPORTANT*: The regular expression should match the *token separators*,not the tokens themselves.*********************************************Note that you may need to escape `pattern` string literal according toyour client language rules. For example, in many programming languagesa string literal for `\W+` pattern is written as `"\\W+"`.There is nothing special about `pattern` (you may have to escape otherstring literals as well); escaping `pattern` is common just because itoften contains characters that should be escaped.*********************************************`group` set to `-1` (the default) is equivalent to "split". Using group>= 0 selects the matching group as the token. For example, if you have:------------------------pattern = '([^']+)'group   = 0input   = aaa 'bbb' 'ccc'------------------------the output will be two tokens: `'bbb'` and `'ccc'` (including the `'`marks). With the same input but using group=1, the output would be:`bbb` and `ccc` (no `'` marks).
 |