| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103 | [[analysis-simplepattern-tokenizer]]=== Simple Pattern Tokenizerexperimental[This functionality is marked as experimental in Lucene]The `simple_pattern` tokenizer uses a regular expression to capture matchingtext as terms. The set of regular expression features it supports is morelimited than the <<analysis-pattern-tokenizer,`pattern`>> tokenizer, but thetokenization is generally faster.This tokenizer does not support splitting the input on a pattern match, unlikethe <<analysis-pattern-tokenizer,`pattern`>> tokenizer. To split on patternmatches using the same restricted regular expression subset, see the<<analysis-simplepatternsplit-tokenizer,`simple_pattern_split`>> tokenizer.This tokenizer uses {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expressions].For an explanation of the supported features and syntax, see <<regexp-syntax,Regular Expression Syntax>>.The default pattern is the empty string, which produces no terms. Thistokenizer should always be configured with a non-default pattern.[float]=== ConfigurationThe `simple_pattern` tokenizer accepts the following parameters:[horizontal]`pattern`::    {lucene-core-javadoc}/org/apache/lucene/util/automaton/RegExp.html[Lucene regular expression], defaults to the empty string.[float]=== Example configurationThis example configures the `simple_pattern` tokenizer to produce terms that arethree-digit numbers[source,console]----------------------------PUT my_index{  "settings": {    "analysis": {      "analyzer": {        "my_analyzer": {          "tokenizer": "my_tokenizer"        }      },      "tokenizer": {        "my_tokenizer": {          "type": "simple_pattern",          "pattern": "[0123456789]{3}"        }      }    }  }}POST my_index/_analyze{  "analyzer": "my_analyzer",  "text": "fd-786-335-514-x"}----------------------------/////////////////////[source,console-result]----------------------------{  "tokens" : [    {      "token" : "786",      "start_offset" : 3,      "end_offset" : 6,      "type" : "word",      "position" : 0    },    {      "token" : "335",      "start_offset" : 7,      "end_offset" : 10,      "type" : "word",      "position" : 1    },    {      "token" : "514",      "start_offset" : 11,      "end_offset" : 14,      "type" : "word",      "position" : 2    }  ]}----------------------------/////////////////////The above example produces these terms:[source,text]---------------------------[ 786, 335, 514 ]---------------------------
 |