recipes.asciidoc 4.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225
  1. [[recipes]]
  2. == Recipes
  3. [float]
  4. [[mixing-exact-search-with-stemming]]
  5. === Mixing exact search with stemming
  6. When building a search application, stemming is often a must as it is desirable
  7. for a query on `skiing` to match documents that contain `ski` or `skis`. But
  8. what if a user wants to search for `skiing` specifically? The typical way to do
  9. this would be to use a <<multi-fields,multi-field>> in order to have the same
  10. content indexed in two different ways:
  11. [source,js]
  12. --------------------------------------------------
  13. PUT index
  14. {
  15. "settings": {
  16. "analysis": {
  17. "analyzer": {
  18. "english_exact": {
  19. "tokenizer": "standard",
  20. "filter": [
  21. "lowercase"
  22. ]
  23. }
  24. }
  25. }
  26. },
  27. "mappings": {
  28. "type": {
  29. "properties": {
  30. "body": {
  31. "type": "text",
  32. "analyzer": "english",
  33. "fields": {
  34. "exact": {
  35. "type": "text",
  36. "analyzer": "english_exact"
  37. }
  38. }
  39. }
  40. }
  41. }
  42. }
  43. }
  44. PUT index/type/1
  45. {
  46. "body": "Ski resort"
  47. }
  48. PUT index/type/2
  49. {
  50. "body": "A pair of skis"
  51. }
  52. POST index/_refresh
  53. --------------------------------------------------
  54. // CONSOLE
  55. With such a setup, searching for `ski` on `body` would return both documents:
  56. [source,js]
  57. --------------------------------------------------
  58. GET index/_search
  59. {
  60. "query": {
  61. "simple_query_string": {
  62. "fields": [ "body" ],
  63. "query": "ski"
  64. }
  65. }
  66. }
  67. --------------------------------------------------
  68. // CONSOLE
  69. // TEST[continued]
  70. [source,js]
  71. --------------------------------------------------
  72. {
  73. "took": 2,
  74. "timed_out": false,
  75. "_shards": {
  76. "total": 5,
  77. "successful": 5,
  78. "failed": 0
  79. },
  80. "hits": {
  81. "total": 2,
  82. "max_score": 0.25811607,
  83. "hits": [
  84. {
  85. "_index": "index",
  86. "_type": "type",
  87. "_id": "2",
  88. "_score": 0.25811607,
  89. "_source": {
  90. "body": "A pair of skis"
  91. }
  92. },
  93. {
  94. "_index": "index",
  95. "_type": "type",
  96. "_id": "1",
  97. "_score": 0.25811607,
  98. "_source": {
  99. "body": "Ski resort"
  100. }
  101. }
  102. ]
  103. }
  104. }
  105. --------------------------------------------------
  106. // TESTRESPONSE[s/"took": 2,/"took": "$body.took",/]
  107. On the other hand, searching for `ski` on `body.exact` would only return
  108. document `1` since the analysis chain of `body.exact` does not perform
  109. stemming.
  110. [source,js]
  111. --------------------------------------------------
  112. GET index/_search
  113. {
  114. "query": {
  115. "simple_query_string": {
  116. "fields": [ "body.exact" ],
  117. "query": "ski"
  118. }
  119. }
  120. }
  121. --------------------------------------------------
  122. // CONSOLE
  123. // TEST[continued]
  124. [source,js]
  125. --------------------------------------------------
  126. {
  127. "took": 1,
  128. "timed_out": false,
  129. "_shards": {
  130. "total": 5,
  131. "successful": 5,
  132. "failed": 0
  133. },
  134. "hits": {
  135. "total": 1,
  136. "max_score": 0.25811607,
  137. "hits": [
  138. {
  139. "_index": "index",
  140. "_type": "type",
  141. "_id": "1",
  142. "_score": 0.25811607,
  143. "_source": {
  144. "body": "Ski resort"
  145. }
  146. }
  147. ]
  148. }
  149. }
  150. --------------------------------------------------
  151. // TESTRESPONSE[s/"took": 1,/"took": "$body.took",/]
  152. This is not something that is easy to expose to end users, as we would need to
  153. have a way to figure out whether they are looking for an exact match or not and
  154. redirect to the appropriate field accordingly. Also what to do if only parts of
  155. the query need to be matched exactly while other parts should still take
  156. stemming into account?
  157. Fortunately, the `query_string` and `simple_query_string` queries have a feature
  158. that allows to solve exactly this problem: `quote_field_suffix`. It allows to
  159. tell Elasticsearch that words that appear in between quotes should be redirected
  160. to a different field, see below:
  161. [source,js]
  162. --------------------------------------------------
  163. GET index/_search
  164. {
  165. "query": {
  166. "simple_query_string": {
  167. "fields": [ "body" ],
  168. "quote_field_suffix": ".exact",
  169. "query": "\"ski\""
  170. }
  171. }
  172. }
  173. --------------------------------------------------
  174. // CONSOLE
  175. // TEST[continued]
  176. [source,js]
  177. --------------------------------------------------
  178. {
  179. "took": 2,
  180. "timed_out": false,
  181. "_shards": {
  182. "total": 5,
  183. "successful": 5,
  184. "failed": 0
  185. },
  186. "hits": {
  187. "total": 1,
  188. "max_score": 0.25811607,
  189. "hits": [
  190. {
  191. "_index": "index",
  192. "_type": "type",
  193. "_id": "1",
  194. "_score": 0.25811607,
  195. "_source": {
  196. "body": "Ski resort"
  197. }
  198. }
  199. ]
  200. }
  201. }
  202. --------------------------------------------------
  203. // TESTRESPONSE[s/"took": 2,/"took": "$body.took",/]
  204. In that case, since `ski` was in-between quotes, it was searched on the
  205. `body.exact` field due to the `quote_field_suffix` parameter, so only document
  206. `1` matched. This allows users to mix exact search with stemmed search as they
  207. like.