analyze.asciidoc 5.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185
  1. [[indices-analyze]]
  2. == Analyze
  3. Performs the analysis process on a text and return the tokens breakdown
  4. of the text.
  5. Can be used without specifying an index against one of the many built in
  6. analyzers:
  7. [source,js]
  8. --------------------------------------------------
  9. curl -XGET 'localhost:9200/_analyze' -d '
  10. {
  11. "analyzer" : "standard",
  12. "text" : "this is a test"
  13. }'
  14. --------------------------------------------------
  15. If text parameter is provided as array of strings, it is analyzed as a multi-valued field.
  16. [source,js]
  17. --------------------------------------------------
  18. curl -XGET 'localhost:9200/_analyze' -d '
  19. {
  20. "analyzer" : "standard",
  21. "text" : ["this is a test", "the second text"]
  22. }'
  23. --------------------------------------------------
  24. Or by building a custom transient analyzer out of tokenizers,
  25. token filters and char filters. Token filters can use the shorter 'filter'
  26. parameter name:
  27. [source,js]
  28. --------------------------------------------------
  29. curl -XGET 'localhost:9200/_analyze' -d '
  30. {
  31. "tokenizer" : "keyword",
  32. "filter" : ["lowercase"],
  33. "text" : "this is a test"
  34. }'
  35. curl -XGET 'localhost:9200/_analyze' -d '
  36. {
  37. "tokenizer" : "keyword",
  38. "token_filter" : ["lowercase"],
  39. "char_filter" : ["html_strip"],
  40. "text" : "this is a <b>test</b>"
  41. }'
  42. --------------------------------------------------
  43. deprecated[5.0.0, Use `filter`/`token_filter`/`char_filter` instead of `filters`/`token_filters`/`char_filters`]
  44. Custom tokenizers, token filters, and character filters can be specified in the request body as follows:
  45. [source,js]
  46. --------------------------------------------------
  47. curl -XGET 'localhost:9200/_analyze' -d '
  48. {
  49. "tokenizer" : "whitespace",
  50. "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  51. "text" : "this is a test"
  52. }'
  53. --------------------------------------------------
  54. It can also run against a specific index:
  55. [source,js]
  56. --------------------------------------------------
  57. curl -XGET 'localhost:9200/test/_analyze' -d '
  58. {
  59. "text" : "this is a test"
  60. }'
  61. --------------------------------------------------
  62. The above will run an analysis on the "this is a test" text, using the
  63. default index analyzer associated with the `test` index. An `analyzer`
  64. can also be provided to use a different analyzer:
  65. [source,js]
  66. --------------------------------------------------
  67. curl -XGET 'localhost:9200/test/_analyze' -d '
  68. {
  69. "analyzer" : "whitespace",
  70. "text" : "this is a test"
  71. }'
  72. --------------------------------------------------
  73. Also, the analyzer can be derived based on a field mapping, for example:
  74. [source,js]
  75. --------------------------------------------------
  76. curl -XGET 'localhost:9200/test/_analyze' -d '
  77. {
  78. "field" : "obj1.field1",
  79. "text" : "this is a test"
  80. }'
  81. --------------------------------------------------
  82. Will cause the analysis to happen based on the analyzer configured in the
  83. mapping for `obj1.field1` (and if not, the default index analyzer).
  84. All parameters can also supplied as request parameters. For example:
  85. [source,js]
  86. --------------------------------------------------
  87. curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&filter=lowercase&text=this+is+a+test'
  88. --------------------------------------------------
  89. For backwards compatibility, we also accept the text parameter as the body of the request,
  90. provided it doesn't start with `{` :
  91. [source,js]
  92. --------------------------------------------------
  93. curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&token_filter=lowercase&char_filter=html_strip' -d 'this is a <b>test</b>'
  94. --------------------------------------------------
  95. === Explain Analyze
  96. If you want to get more advanced details, set `explain` to `true` (defaults to `false`). It will output all token attributes for each token.
  97. You can filter token attributes you want to output by setting `attributes` option.
  98. experimental[The format of the additional detail information is experimental and can change at any time]
  99. [source,js]
  100. --------------------------------------------------
  101. GET _analyze
  102. {
  103. "tokenizer" : "standard",
  104. "filter" : ["snowball"],
  105. "text" : "detailed output",
  106. "explain" : true,
  107. "attributes" : ["keyword"] <1>
  108. }
  109. --------------------------------------------------
  110. // CONSOLE
  111. <1> Set "keyword" to output "keyword" attribute only
  112. The request returns the following result:
  113. [source,js]
  114. --------------------------------------------------
  115. {
  116. "detail" : {
  117. "custom_analyzer" : true,
  118. "charfilters" : [ ],
  119. "tokenizer" : {
  120. "name" : "standard",
  121. "tokens" : [ {
  122. "token" : "detailed",
  123. "start_offset" : 0,
  124. "end_offset" : 8,
  125. "type" : "<ALPHANUM>",
  126. "position" : 0
  127. }, {
  128. "token" : "output",
  129. "start_offset" : 9,
  130. "end_offset" : 15,
  131. "type" : "<ALPHANUM>",
  132. "position" : 1
  133. } ]
  134. },
  135. "tokenfilters" : [ {
  136. "name" : "snowball",
  137. "tokens" : [ {
  138. "token" : "detail",
  139. "start_offset" : 0,
  140. "end_offset" : 8,
  141. "type" : "<ALPHANUM>",
  142. "position" : 0,
  143. "keyword" : false <1>
  144. }, {
  145. "token" : "output",
  146. "start_offset" : 9,
  147. "end_offset" : 15,
  148. "type" : "<ALPHANUM>",
  149. "position" : 1,
  150. "keyword" : false <1>
  151. } ]
  152. } ]
  153. }
  154. }
  155. --------------------------------------------------
  156. // TESTRESPONSE
  157. <1> Output only "keyword" attribute, since specify "attributes" in the request.