| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109 | [[analysis-keyword-tokenizer]]=== Keyword tokenizer++++<titleabbrev>Keyword</titleabbrev>++++The `keyword` tokenizer  is a ``noop'' tokenizer that accepts whatever text itis given and outputs the exact same text as a single term.  It can be combinedwith token filters to normalise output, e.g. lower-casing email addresses.[discrete]=== Example output[source,console]---------------------------POST _analyze{  "tokenizer": "keyword",  "text": "New York"}---------------------------/////////////////////[source,console-result]----------------------------{  "tokens": [    {      "token": "New York",      "start_offset": 0,      "end_offset": 8,      "type": "word",      "position": 0    }  ]}----------------------------/////////////////////The above sentence would produce the following term:[source,text]---------------------------[ New York ]---------------------------[discrete][[analysis-keyword-tokenizer-token-filters]]=== Combine with token filtersYou can combine the `keyword` tokenizer with token filters to normalisestructured data, such as product IDs or email addresses.For example, the following <<indices-analyze,analyze API>> request uses the`keyword` tokenizer and <<analysis-lowercase-tokenfilter,`lowercase`>> filter toconvert an email address to lowercase.[source,console]---------------------------POST _analyze{  "tokenizer": "keyword",  "filter": [ "lowercase" ],  "text": "john.SMITH@example.COM"}---------------------------/////////////////////[source,console-result]----------------------------{  "tokens": [    {      "token": "john.smith@example.com",      "start_offset": 0,      "end_offset": 22,      "type": "word",      "position": 0    }  ]}----------------------------/////////////////////The request produces the following token:[source,text]---------------------------[ john.smith@example.com ]---------------------------[discrete]=== ConfigurationThe `keyword` tokenizer accepts the following parameters:[horizontal]`buffer_size`::    The number of characters read into the term buffer in a single pass.    Defaults to `256`.  The term buffer will grow by this size until all the    text has been consumed.  It is advisable not to change this setting.
 |