keyword-tokenizer.asciidoc 1.2 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
  1. [[analysis-keyword-tokenizer]]
  2. === Keyword Tokenizer
  3. The `keyword` tokenizer is a ``noop'' tokenizer that accepts whatever text it
  4. is given and outputs the exact same text as a single term. It can be combined
  5. with token filters to normalise output, e.g. lower-casing email addresses.
  6. [float]
  7. === Example output
  8. [source,js]
  9. ---------------------------
  10. POST _analyze
  11. {
  12. "tokenizer": "keyword",
  13. "text": "New York"
  14. }
  15. ---------------------------
  16. // CONSOLE
  17. /////////////////////
  18. [source,console-result]
  19. ----------------------------
  20. {
  21. "tokens": [
  22. {
  23. "token": "New York",
  24. "start_offset": 0,
  25. "end_offset": 8,
  26. "type": "word",
  27. "position": 0
  28. }
  29. ]
  30. }
  31. ----------------------------
  32. /////////////////////
  33. The above sentence would produce the following term:
  34. [source,text]
  35. ---------------------------
  36. [ New York ]
  37. ---------------------------
  38. [float]
  39. === Configuration
  40. The `keyword` tokenizer accepts the following parameters:
  41. [horizontal]
  42. `buffer_size`::
  43. The number of characters read into the term buffer in a single pass.
  44. Defaults to `256`. The term buffer will grow by this size until all the
  45. text has been consumed. It is advisable not to change this setting.