keyword-tokenizer.asciidoc 1.2 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
  1. [[analysis-keyword-tokenizer]]
  2. === Keyword Tokenizer
  3. The `keyword` tokenizer is a ``noop'' tokenizer that accepts whatever text it
  4. is given and outputs the exact same text as a single term. It can be combined
  5. with token filters to normalise output, e.g. lower-casing email addresses.
  6. [float]
  7. === Example output
  8. [source,console]
  9. ---------------------------
  10. POST _analyze
  11. {
  12. "tokenizer": "keyword",
  13. "text": "New York"
  14. }
  15. ---------------------------
  16. /////////////////////
  17. [source,console-result]
  18. ----------------------------
  19. {
  20. "tokens": [
  21. {
  22. "token": "New York",
  23. "start_offset": 0,
  24. "end_offset": 8,
  25. "type": "word",
  26. "position": 0
  27. }
  28. ]
  29. }
  30. ----------------------------
  31. /////////////////////
  32. The above sentence would produce the following term:
  33. [source,text]
  34. ---------------------------
  35. [ New York ]
  36. ---------------------------
  37. [float]
  38. === Configuration
  39. The `keyword` tokenizer accepts the following parameters:
  40. [horizontal]
  41. `buffer_size`::
  42. The number of characters read into the term buffer in a single pass.
  43. Defaults to `256`. The term buffer will grow by this size until all the
  44. text has been consumed. It is advisable not to change this setting.