simple-analyzer.asciidoc 2.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150
  1. [[analysis-simple-analyzer]]
  2. === Simple analyzer
  3. ++++
  4. <titleabbrev>Simple</titleabbrev>
  5. ++++
  6. The `simple` analyzer breaks text into tokens at any non-letter character, such
  7. as numbers, spaces, hyphens and apostrophes, discards non-letter characters,
  8. and changes uppercase to lowercase.
  9. [[analysis-simple-analyzer-ex]]
  10. ==== Example
  11. [source,console]
  12. ----
  13. POST _analyze
  14. {
  15. "analyzer": "simple",
  16. "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
  17. }
  18. ----
  19. ////
  20. [source,console-result]
  21. ----
  22. {
  23. "tokens": [
  24. {
  25. "token": "the",
  26. "start_offset": 0,
  27. "end_offset": 3,
  28. "type": "word",
  29. "position": 0
  30. },
  31. {
  32. "token": "quick",
  33. "start_offset": 6,
  34. "end_offset": 11,
  35. "type": "word",
  36. "position": 1
  37. },
  38. {
  39. "token": "brown",
  40. "start_offset": 12,
  41. "end_offset": 17,
  42. "type": "word",
  43. "position": 2
  44. },
  45. {
  46. "token": "foxes",
  47. "start_offset": 18,
  48. "end_offset": 23,
  49. "type": "word",
  50. "position": 3
  51. },
  52. {
  53. "token": "jumped",
  54. "start_offset": 24,
  55. "end_offset": 30,
  56. "type": "word",
  57. "position": 4
  58. },
  59. {
  60. "token": "over",
  61. "start_offset": 31,
  62. "end_offset": 35,
  63. "type": "word",
  64. "position": 5
  65. },
  66. {
  67. "token": "the",
  68. "start_offset": 36,
  69. "end_offset": 39,
  70. "type": "word",
  71. "position": 6
  72. },
  73. {
  74. "token": "lazy",
  75. "start_offset": 40,
  76. "end_offset": 44,
  77. "type": "word",
  78. "position": 7
  79. },
  80. {
  81. "token": "dog",
  82. "start_offset": 45,
  83. "end_offset": 48,
  84. "type": "word",
  85. "position": 8
  86. },
  87. {
  88. "token": "s",
  89. "start_offset": 49,
  90. "end_offset": 50,
  91. "type": "word",
  92. "position": 9
  93. },
  94. {
  95. "token": "bone",
  96. "start_offset": 51,
  97. "end_offset": 55,
  98. "type": "word",
  99. "position": 10
  100. }
  101. ]
  102. }
  103. ----
  104. ////
  105. The `simple` analyzer parses the sentence and produces the following
  106. tokens:
  107. [source,text]
  108. ----
  109. [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
  110. ----
  111. [[analysis-simple-analyzer-definition]]
  112. ==== Definition
  113. The `simple` analyzer is defined by one tokenizer:
  114. Tokenizer::
  115. * <<analysis-lowercase-tokenizer, Lowercase Tokenizer>>
  116. [[analysis-simple-analyzer-customize]]
  117. ==== Customize
  118. To customize the `simple` analyzer, duplicate it to create the basis for
  119. a custom analyzer. This custom analyzer can be modified as required, usually by
  120. adding token filters.
  121. [source,console]
  122. ----
  123. PUT /my_index
  124. {
  125. "settings": {
  126. "analysis": {
  127. "analyzer": {
  128. "my_custom_simple_analyzer": {
  129. "tokenizer": "lowercase",
  130. "filter": [ <1>
  131. ]
  132. }
  133. }
  134. }
  135. }
  136. }
  137. ----
  138. <1> Add token filters here.