analyzer.asciidoc 3.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
  1. [[analyzer]]
  2. === `analyzer`
  3. [IMPORTANT]
  4. ====
  5. Only <<text,`text`>> fields support the `analyzer` mapping parameter.
  6. ====
  7. The `analyzer` parameter specifies the <<analyzer-anatomy,analyzer>> used for
  8. <<analysis,text analysis>> when indexing or searching a `text` field.
  9. Unless overridden with the <<search-analyzer,`search_analyzer`>> mapping
  10. parameter, this analyzer is used for both <<analysis-index-search-time,index and
  11. search analysis>>. See <<specify-analyzer>>.
  12. [TIP]
  13. ====
  14. We recommend testing analyzers before using them in production. See
  15. <<test-analyzer>>.
  16. ====
  17. TIP: The `analyzer` setting can *not* be updated on existing fields
  18. using the <<indices-put-mapping,update mapping API>>.
  19. [[search-quote-analyzer]]
  20. ==== `search_quote_analyzer`
  21. The `search_quote_analyzer` setting allows you to specify an analyzer for phrases, this is particularly useful when dealing with disabling
  22. stop words for phrase queries.
  23. To disable stop words for phrases a field utilising three analyzer settings will be required:
  24. 1. An `analyzer` setting for indexing all terms including stop words
  25. 2. A `search_analyzer` setting for non-phrase queries that will remove stop words
  26. 3. A `search_quote_analyzer` setting for phrase queries that will not remove stop words
  27. [source,console]
  28. --------------------------------------------------
  29. PUT my-index-000001
  30. {
  31. "settings":{
  32. "analysis":{
  33. "analyzer":{
  34. "my_analyzer":{ <1>
  35. "type":"custom",
  36. "tokenizer":"standard",
  37. "filter":[
  38. "lowercase"
  39. ]
  40. },
  41. "my_stop_analyzer":{ <2>
  42. "type":"custom",
  43. "tokenizer":"standard",
  44. "filter":[
  45. "lowercase",
  46. "english_stop"
  47. ]
  48. }
  49. },
  50. "filter":{
  51. "english_stop":{
  52. "type":"stop",
  53. "stopwords":"_english_"
  54. }
  55. }
  56. }
  57. },
  58. "mappings":{
  59. "properties":{
  60. "title": {
  61. "type":"text",
  62. "analyzer":"my_analyzer", <3>
  63. "search_analyzer":"my_stop_analyzer", <4>
  64. "search_quote_analyzer":"my_analyzer" <5>
  65. }
  66. }
  67. }
  68. }
  69. PUT my-index-000001/_doc/1
  70. {
  71. "title":"The Quick Brown Fox"
  72. }
  73. PUT my-index-000001/_doc/2
  74. {
  75. "title":"A Quick Brown Fox"
  76. }
  77. GET my-index-000001/_search
  78. {
  79. "query":{
  80. "query_string":{
  81. "query":"\"the quick brown fox\"" <6>
  82. }
  83. }
  84. }
  85. --------------------------------------------------
  86. TIP: The `search_quote_analyzer` setting can be updated on existing fields
  87. using the <<indices-put-mapping,update mapping API>>.
  88. <1> `my_analyzer` analyzer which tokens all terms including stop words
  89. <2> `my_stop_analyzer` analyzer which removes stop words
  90. <3> `analyzer` setting that points to the `my_analyzer` analyzer which will be used at index time
  91. <4> `search_analyzer` setting that points to the `my_stop_analyzer` and removes stop words for non-phrase queries
  92. <5> `search_quote_analyzer` setting that points to the `my_analyzer` analyzer and ensures that stop words are not removed from phrase queries
  93. <6> Since the query is wrapped in quotes it is detected as a phrase query therefore the `search_quote_analyzer` kicks in and ensures the stop words
  94. are not removed from the query. The `my_analyzer` analyzer will then return the following tokens [`the`, `quick`, `brown`, `fox`] which will match one
  95. of the documents. Meanwhile term queries will be analyzed with the `my_stop_analyzer` analyzer which will filter out stop words. So a search for either
  96. `The quick brown fox` or `A quick brown fox` will return both documents since both documents contain the following tokens [`quick`, `brown`, `fox`].
  97. Without the `search_quote_analyzer` it would not be possible to do exact matches for phrase queries as the stop words from phrase queries would be
  98. removed resulting in both documents matching.