keyword-tokenizer.asciidoc 1.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
  1. [[analysis-keyword-tokenizer]]
  2. === Keyword Tokenizer
  3. The `keyword` tokenizer is a ``noop'' tokenizer that accepts whatever text it
  4. is given and outputs the exact same text as a single term. It can be combined
  5. with token filters to normalise output, e.g. lower-casing email addresses.
  6. [float]
  7. === Example output
  8. [source,js]
  9. ---------------------------
  10. POST _analyze
  11. {
  12. "tokenizer": "keyword",
  13. "text": "New York"
  14. }
  15. ---------------------------
  16. // CONSOLE
  17. /////////////////////
  18. [source,js]
  19. ----------------------------
  20. {
  21. "tokens": [
  22. {
  23. "token": "New York",
  24. "start_offset": 0,
  25. "end_offset": 8,
  26. "type": "word",
  27. "position": 0
  28. }
  29. ]
  30. }
  31. ----------------------------
  32. // TESTRESPONSE
  33. /////////////////////
  34. The above sentence would produce the following term:
  35. [source,text]
  36. ---------------------------
  37. [ New York ]
  38. ---------------------------
  39. [float]
  40. === Configuration
  41. The `keyword` tokenizer accepts the following parameters:
  42. [horizontal]
  43. `buffer_size`::
  44. The number of characters read into the term buffer in a single pass.
  45. Defaults to `256`. The term buffer will grow by this size until all the
  46. text has been consumed. It is advisable not to change this setting.