lqb
/
elasticsearch
mirror of https://gitee.com/mirrors/elasticsearch.git


			
							12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
							[[analysis-keyword-tokenizer]]
=== Keyword Tokenizer

The `keyword` tokenizer  is a ``noop'' tokenizer that accepts whatever text it
is given and outputs the exact same text as a single term.  It can be combined
with token filters to normalise output, e.g. lower-casing email addresses.

[float]
=== Example output

[source,console]
---------------------------
POST _analyze
{
  "tokenizer": "keyword",
  "text": "New York"
}
---------------------------

/////////////////////

[source,console-result]
----------------------------
{
  "tokens": [
    {
      "token": "New York",
      "start_offset": 0,
      "end_offset": 8,
      "type": "word",
      "position": 0
    }
  ]
}
----------------------------

/////////////////////


The above sentence would produce the following term:

[source,text]
---------------------------
[ New York ]
---------------------------

[float]
=== Configuration

The `keyword` tokenizer accepts the following parameters:

[horizontal]
`buffer_size`::

    The number of characters read into the term buffer in a single pass.
    Defaults to `256`.  The term buffer will grow by this size until all the
    text has been consumed.  It is advisable not to change this setting.