| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384 | [[analysis-chargroup-tokenizer]]=== Character group tokenizer++++<titleabbrev>Character group</titleabbrev>++++The `char_group` tokenizer breaks text into terms whenever it encounters acharacter which is in a defined set. It is mostly useful for cases where a simplecustom tokenization is desired, and the overhead of use of the <<analysis-pattern-tokenizer, `pattern` tokenizer>>is not acceptable.[discrete]=== ConfigurationThe `char_group` tokenizer accepts one parameter:[horizontal]`tokenize_on_chars`::    A list containing a list of characters to tokenize the string on. Whenever a character    from this list is encountered, a new token is started. This accepts either single    characters like e.g. `-`, or character groups: `whitespace`, `letter`, `digit`,    `punctuation`, `symbol`.`max_token_length`::    The maximum token length. If a token is seen that exceeds this length then    it is split at `max_token_length` intervals. Defaults to `255`.[discrete]=== Example output[source,console]---------------------------POST _analyze{  "tokenizer": {    "type": "char_group",    "tokenize_on_chars": [      "whitespace",      "-",      "\n"    ]  },  "text": "The QUICK brown-fox"}---------------------------returns[source,console-result]---------------------------{  "tokens": [    {      "token": "The",      "start_offset": 0,      "end_offset": 3,      "type": "word",      "position": 0    },    {      "token": "QUICK",      "start_offset": 4,      "end_offset": 9,      "type": "word",      "position": 1    },    {      "token": "brown",      "start_offset": 10,      "end_offset": 15,      "type": "word",      "position": 2    },    {      "token": "fox",      "start_offset": 16,      "end_offset": 19,      "type": "word",      "position": 3    }  ]}---------------------------
 |