multi-match-query.asciidoc 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553
  1. [[query-dsl-multi-match-query]]
  2. === Multi-match query
  3. ++++
  4. <titleabbrev>Multi-match</titleabbrev>
  5. ++++
  6. The `multi_match` query builds on the <<query-dsl-match-query,`match` query>>
  7. to allow multi-field queries:
  8. [source,console]
  9. --------------------------------------------------
  10. GET /_search
  11. {
  12. "query": {
  13. "multi_match" : {
  14. "query": "this is a test", <1>
  15. "fields": [ "subject", "message" ] <2>
  16. }
  17. }
  18. }
  19. --------------------------------------------------
  20. <1> The query string.
  21. <2> The fields to be queried.
  22. [discrete]
  23. [[field-boost]]
  24. ==== `fields` and per-field boosting
  25. Fields can be specified with wildcards, eg:
  26. [source,console]
  27. --------------------------------------------------
  28. GET /_search
  29. {
  30. "query": {
  31. "multi_match" : {
  32. "query": "Will Smith",
  33. "fields": [ "title", "*_name" ] <1>
  34. }
  35. }
  36. }
  37. --------------------------------------------------
  38. <1> Query the `title`, `first_name` and `last_name` fields.
  39. Individual fields can be boosted with the caret (`^`) notation:
  40. [source,console]
  41. --------------------------------------------------
  42. GET /_search
  43. {
  44. "query": {
  45. "multi_match" : {
  46. "query" : "this is a test",
  47. "fields" : [ "subject^3", "message" ] <1>
  48. }
  49. }
  50. }
  51. --------------------------------------------------
  52. <1> The query multiplies the `subject` field's score by three but leaves the
  53. `message` field's score unchanged.
  54. If no `fields` are provided, the `multi_match` query defaults to the `index.query.default_field`
  55. index settings, which in turn defaults to `*`. `*` extracts all fields in the mapping that
  56. are eligible to term queries and filters the metadata fields. All extracted fields are then
  57. combined to build a query.
  58. include::combined-fields-query.asciidoc[tag=max-clause-limit]
  59. [[multi-match-types]]
  60. [discrete]
  61. ==== Types of `multi_match` query:
  62. The way the `multi_match` query is executed internally depends on the `type`
  63. parameter, which can be set to:
  64. [horizontal]
  65. `best_fields`:: (*default*) Finds documents which match any field, but
  66. uses the `_score` from the best field. See <<type-best-fields>>.
  67. `most_fields`:: Finds documents which match any field and combines
  68. the `_score` from each field. See <<type-most-fields>>.
  69. `cross_fields`:: Treats fields with the same `analyzer` as though they
  70. were one big field. Looks for each word in *any*
  71. field. See <<type-cross-fields>>.
  72. `phrase`:: Runs a `match_phrase` query on each field and uses the `_score`
  73. from the best field. See <<type-phrase>>.
  74. `phrase_prefix`:: Runs a `match_phrase_prefix` query on each field and uses
  75. the `_score` from the best field. See <<type-phrase>>.
  76. `bool_prefix`:: Creates a `match_bool_prefix` query on each field and
  77. combines the `_score` from each field. See
  78. <<type-bool-prefix>>.
  79. [[type-best-fields]]
  80. ==== `best_fields`
  81. The `best_fields` type is most useful when you are searching for multiple
  82. words best found in the same field. For instance ``brown fox'' in a single
  83. field is more meaningful than ``brown'' in one field and ``fox'' in the other.
  84. The `best_fields` type generates a <<query-dsl-match-query,`match` query>> for
  85. each field and wraps them in a <<query-dsl-dis-max-query,`dis_max`>> query, to
  86. find the single best matching field. For instance, this query:
  87. [source,console]
  88. --------------------------------------------------
  89. GET /_search
  90. {
  91. "query": {
  92. "multi_match" : {
  93. "query": "brown fox",
  94. "type": "best_fields",
  95. "fields": [ "subject", "message" ],
  96. "tie_breaker": 0.3
  97. }
  98. }
  99. }
  100. --------------------------------------------------
  101. would be executed as:
  102. [source,console]
  103. --------------------------------------------------
  104. GET /_search
  105. {
  106. "query": {
  107. "dis_max": {
  108. "queries": [
  109. { "match": { "subject": "brown fox" }},
  110. { "match": { "message": "brown fox" }}
  111. ],
  112. "tie_breaker": 0.3
  113. }
  114. }
  115. }
  116. --------------------------------------------------
  117. Normally the `best_fields` type uses the score of the *single* best matching
  118. field, but if `tie_breaker` is specified, then it calculates the score as
  119. follows:
  120. * the score from the best matching field
  121. * plus `tie_breaker * _score` for all other matching fields
  122. Also, accepts `analyzer`, `boost`, `operator`, `minimum_should_match`,
  123. `fuzziness`, `lenient`, `prefix_length`, `max_expansions`, `fuzzy_rewrite`, `zero_terms_query`,
  124. `auto_generate_synonyms_phrase_query` and `fuzzy_transpositions`,
  125. as explained in <<query-dsl-match-query, match query>>.
  126. [IMPORTANT]
  127. [[operator-min]]
  128. .`operator` and `minimum_should_match`
  129. ===================================================
  130. The `best_fields` and `most_fields` types are _field-centric_ -- they generate
  131. a `match` query *per field*. This means that the `operator` and
  132. `minimum_should_match` parameters are applied to each field individually,
  133. which is probably not what you want.
  134. Take this query for example:
  135. [source,console]
  136. --------------------------------------------------
  137. GET /_search
  138. {
  139. "query": {
  140. "multi_match" : {
  141. "query": "Will Smith",
  142. "type": "best_fields",
  143. "fields": [ "first_name", "last_name" ],
  144. "operator": "and" <1>
  145. }
  146. }
  147. }
  148. --------------------------------------------------
  149. <1> All terms must be present.
  150. This query is executed as:
  151. (+first_name:will +first_name:smith)
  152. | (+last_name:will +last_name:smith)
  153. In other words, *all terms* must be present *in a single field* for a document
  154. to match.
  155. The <<query-dsl-combined-fields-query, `combined_fields`>> query offers a
  156. term-centric approach that handles `operator` and `minimum_should_match` on a
  157. per-term basis. The other multi-match mode <<type-cross-fields>> also
  158. addresses this issue.
  159. ===================================================
  160. [[type-most-fields]]
  161. ==== `most_fields`
  162. The `most_fields` type is most useful when querying multiple fields that
  163. contain the same text analyzed in different ways. For instance, the main
  164. field may contain synonyms, stemming and terms without diacritics. A second
  165. field may contain the original terms, and a third field might contain
  166. shingles. By combining scores from all three fields we can match as many
  167. documents as possible with the main field, but use the second and third fields
  168. to push the most similar results to the top of the list.
  169. This query:
  170. [source,console]
  171. --------------------------------------------------
  172. GET /_search
  173. {
  174. "query": {
  175. "multi_match" : {
  176. "query": "quick brown fox",
  177. "type": "most_fields",
  178. "fields": [ "title", "title.original", "title.shingles" ]
  179. }
  180. }
  181. }
  182. --------------------------------------------------
  183. would be executed as:
  184. [source,console]
  185. --------------------------------------------------
  186. GET /_search
  187. {
  188. "query": {
  189. "bool": {
  190. "should": [
  191. { "match": { "title": "quick brown fox" }},
  192. { "match": { "title.original": "quick brown fox" }},
  193. { "match": { "title.shingles": "quick brown fox" }}
  194. ]
  195. }
  196. }
  197. }
  198. --------------------------------------------------
  199. The score from each `match` clause is added together, then divided by the
  200. number of `match` clauses.
  201. Also, accepts `analyzer`, `boost`, `operator`, `minimum_should_match`,
  202. `fuzziness`, `lenient`, `prefix_length`, `max_expansions`, `fuzzy_rewrite`, and `zero_terms_query`.
  203. [[type-phrase]]
  204. ==== `phrase` and `phrase_prefix`
  205. The `phrase` and `phrase_prefix` types behave just like <<type-best-fields>>,
  206. but they use a `match_phrase` or `match_phrase_prefix` query instead of a
  207. `match` query.
  208. This query:
  209. [source,console]
  210. --------------------------------------------------
  211. GET /_search
  212. {
  213. "query": {
  214. "multi_match" : {
  215. "query": "quick brown f",
  216. "type": "phrase_prefix",
  217. "fields": [ "subject", "message" ]
  218. }
  219. }
  220. }
  221. --------------------------------------------------
  222. would be executed as:
  223. [source,console]
  224. --------------------------------------------------
  225. GET /_search
  226. {
  227. "query": {
  228. "dis_max": {
  229. "queries": [
  230. { "match_phrase_prefix": { "subject": "quick brown f" }},
  231. { "match_phrase_prefix": { "message": "quick brown f" }}
  232. ]
  233. }
  234. }
  235. }
  236. --------------------------------------------------
  237. Also, accepts `analyzer`, `boost`, `lenient` and `zero_terms_query` as explained
  238. in <<query-dsl-match-query>>, as well as `slop` which is explained in <<query-dsl-match-query-phrase>>.
  239. Type `phrase_prefix` additionally accepts `max_expansions`.
  240. [IMPORTANT]
  241. [[phrase-fuzziness]]
  242. .`phrase`, `phrase_prefix` and `fuzziness`
  243. ===================================================
  244. The `fuzziness` parameter cannot be used with the `phrase` or `phrase_prefix` type.
  245. ===================================================
  246. [[type-cross-fields]]
  247. ==== `cross_fields`
  248. The `cross_fields` type is particularly useful with structured documents where
  249. multiple fields *should* match. For instance, when querying the `first_name`
  250. and `last_name` fields for ``Will Smith'', the best match is likely to have
  251. ``Will'' in one field and ``Smith'' in the other.
  252. ****
  253. This sounds like a job for <<type-most-fields>> but there are two problems
  254. with that approach. The first problem is that `operator` and
  255. `minimum_should_match` are applied per-field, instead of per-term (see
  256. <<operator-min,explanation above>>).
  257. The second problem is to do with relevance: the different term frequencies in
  258. the `first_name` and `last_name` fields can produce unexpected results.
  259. For instance, imagine we have two people: ``Will Smith'' and ``Smith Jones''.
  260. ``Smith'' as a last name is very common (and so is of low importance) but
  261. ``Smith'' as a first name is very uncommon (and so is of great importance).
  262. If we do a search for ``Will Smith'', the ``Smith Jones'' document will
  263. probably appear above the better matching ``Will Smith'' because the score of
  264. `first_name:smith` has trumped the combined scores of `first_name:will` plus
  265. `last_name:smith`.
  266. ****
  267. One way of dealing with these types of queries is simply to index the
  268. `first_name` and `last_name` fields into a single `full_name` field. Of
  269. course, this can only be done at index time.
  270. The `cross_field` type tries to solve these problems at query time by taking a
  271. _term-centric_ approach. It first analyzes the query string into individual
  272. terms, then looks for each term in any of the fields, as though they were one
  273. big field.
  274. A query like:
  275. [source,console]
  276. --------------------------------------------------
  277. GET /_search
  278. {
  279. "query": {
  280. "multi_match" : {
  281. "query": "Will Smith",
  282. "type": "cross_fields",
  283. "fields": [ "first_name", "last_name" ],
  284. "operator": "and"
  285. }
  286. }
  287. }
  288. --------------------------------------------------
  289. is executed as:
  290. +(first_name:will last_name:will)
  291. +(first_name:smith last_name:smith)
  292. In other words, *all terms* must be present *in at least one field* for a
  293. document to match. (Compare this to
  294. <<operator-min,the logic used for `best_fields` and `most_fields`>>.)
  295. That solves one of the two problems. The problem of differing term frequencies
  296. is solved by _blending_ the term frequencies for all fields in order to even
  297. out the differences.
  298. In practice, `first_name:smith` will be treated as though it has the same
  299. frequencies as `last_name:smith`, plus one. This will make matches on
  300. `first_name` and `last_name` have comparable scores, with a tiny advantage
  301. for `last_name` since it is the most likely field that contains `smith`.
  302. Note that `cross_fields` is usually only useful on short string fields
  303. that all have a `boost` of `1`. Otherwise boosts, term freqs and length
  304. normalization contribute to the score in such a way that the blending of term
  305. statistics is not meaningful anymore.
  306. If you run the above query through the <<search-validate>>, it returns this
  307. explanation:
  308. +blended("will", fields: [first_name, last_name])
  309. +blended("smith", fields: [first_name, last_name])
  310. Also, accepts `analyzer`, `boost`, `operator`, `minimum_should_match`,
  311. `lenient` and `zero_terms_query`.
  312. WARNING: The `cross_fields` type blends field statistics in a way that does
  313. not always produce well-formed scores (for example scores can become
  314. negative). As an alternative, you can consider the
  315. <<query-dsl-combined-fields-query,`combined_fields`>> query, which is also
  316. term-centric but combines field statistics in a more robust way.
  317. [[cross-field-analysis]]
  318. ===== `cross_field` and analysis
  319. The `cross_field` type can only work in term-centric mode on fields that have
  320. the same analyzer. Fields with the same analyzer are grouped together as in
  321. the example above. If there are multiple groups, the query will use the best
  322. score from any group.
  323. For instance, if we have a `first` and `last` field which have
  324. the same analyzer, plus a `first.edge` and `last.edge` which
  325. both use an `edge_ngram` analyzer, this query:
  326. [source,console]
  327. --------------------------------------------------
  328. GET /_search
  329. {
  330. "query": {
  331. "multi_match" : {
  332. "query": "Jon",
  333. "type": "cross_fields",
  334. "fields": [
  335. "first", "first.edge",
  336. "last", "last.edge"
  337. ]
  338. }
  339. }
  340. }
  341. --------------------------------------------------
  342. would be executed as:
  343. blended("jon", fields: [first, last])
  344. | (
  345. blended("j", fields: [first.edge, last.edge])
  346. blended("jo", fields: [first.edge, last.edge])
  347. blended("jon", fields: [first.edge, last.edge])
  348. )
  349. In other words, `first` and `last` would be grouped together and
  350. treated as a single field, and `first.edge` and `last.edge` would be
  351. grouped together and treated as a single field.
  352. Having multiple groups is fine, but when combined with `operator` or
  353. `minimum_should_match`, it can suffer from the <<operator-min,same problem>>
  354. as `most_fields` or `best_fields`.
  355. You can easily rewrite this query yourself as two separate `cross_fields`
  356. queries combined with a `dis_max` query, and apply the `minimum_should_match`
  357. parameter to just one of them:
  358. [source,console]
  359. --------------------------------------------------
  360. GET /_search
  361. {
  362. "query": {
  363. "dis_max": {
  364. "queries": [
  365. {
  366. "multi_match" : {
  367. "query": "Will Smith",
  368. "type": "cross_fields",
  369. "fields": [ "first", "last" ],
  370. "minimum_should_match": "50%" <1>
  371. }
  372. },
  373. {
  374. "multi_match" : {
  375. "query": "Will Smith",
  376. "type": "cross_fields",
  377. "fields": [ "*.edge" ]
  378. }
  379. }
  380. ]
  381. }
  382. }
  383. }
  384. --------------------------------------------------
  385. <1> Either `will` or `smith` must be present in either of the `first`
  386. or `last` fields
  387. You can force all fields into the same group by specifying the `analyzer`
  388. parameter in the query.
  389. [source,console]
  390. --------------------------------------------------
  391. GET /_search
  392. {
  393. "query": {
  394. "multi_match" : {
  395. "query": "Jon",
  396. "type": "cross_fields",
  397. "analyzer": "standard", <1>
  398. "fields": [ "first", "last", "*.edge" ]
  399. }
  400. }
  401. }
  402. --------------------------------------------------
  403. <1> Use the `standard` analyzer for all fields.
  404. which will be executed as:
  405. blended("will", fields: [first, first.edge, last.edge, last])
  406. blended("smith", fields: [first, first.edge, last.edge, last])
  407. [[tie-breaker]]
  408. ===== `tie_breaker`
  409. By default, each per-term `blended` query will use the best score returned by
  410. any field in a group. Then when combining scores across groups, the query uses
  411. the best score from any group. The `tie_breaker` parameter can change the
  412. behavior for both of these steps:
  413. [horizontal]
  414. `0.0`:: Take the single best score out of (eg) `first_name:will`
  415. and `last_name:will` (default)
  416. `1.0`:: Add together the scores for (eg) `first_name:will` and
  417. `last_name:will`
  418. `0.0 < n < 1.0`:: Take the single best score plus +tie_breaker+ multiplied
  419. by each of the scores from other matching fields/ groups
  420. [IMPORTANT]
  421. [[crossfields-fuzziness]]
  422. .`cross_fields` and `fuzziness`
  423. ===================================================
  424. The `fuzziness` parameter cannot be used with the `cross_fields` type.
  425. ===================================================
  426. [[type-bool-prefix]]
  427. ==== `bool_prefix`
  428. The `bool_prefix` type's scoring behaves like <<type-most-fields>>, but using a
  429. <<query-dsl-match-bool-prefix-query,`match_bool_prefix` query>> instead of a
  430. `match` query.
  431. [source,console]
  432. --------------------------------------------------
  433. GET /_search
  434. {
  435. "query": {
  436. "multi_match" : {
  437. "query": "quick brown f",
  438. "type": "bool_prefix",
  439. "fields": [ "subject", "message" ]
  440. }
  441. }
  442. }
  443. --------------------------------------------------
  444. The `analyzer`, `boost`, `operator`, `minimum_should_match`, `lenient`,
  445. `zero_terms_query`, and `auto_generate_synonyms_phrase_query` parameters as
  446. explained in <<query-dsl-match-query, match query>> are supported. The
  447. `fuzziness`, `prefix_length`, `max_expansions`, `fuzzy_rewrite`, and
  448. `fuzzy_transpositions` parameters are supported for the terms that are used to
  449. construct term queries, but do not have an effect on the prefix query
  450. constructed from the final term.
  451. The `slop` parameter is not supported by this query type.