123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508 |
- [[query-dsl-multi-match-query]]
- === Multi Match Query
- The `multi_match` query builds on the <<query-dsl-match-query,`match` query>>
- to allow multi-field queries:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "multi_match" : {
- "query": "this is a test", <1>
- "fields": [ "subject", "message" ] <2>
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <1> The query string.
- <2> The fields to be queried.
- [float]
- ==== `fields` and per-field boosting
- Fields can be specified with wildcards, eg:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "multi_match" : {
- "query": "Will Smith",
- "fields": [ "title", "*_name" ] <1>
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <1> Query the `title`, `first_name` and `last_name` fields.
- Individual fields can be boosted with the caret (`^`) notation:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "multi_match" : {
- "query" : "this is a test",
- "fields" : [ "subject^3", "message" ] <1>
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <1> The `subject` field is three times as important as the `message` field.
- [[multi-match-types]]
- [float]
- ==== Types of `multi_match` query:
- The way the `multi_match` query is executed internally depends on the `type`
- parameter, which can be set to:
- [horizontal]
- `best_fields`:: (*default*) Finds documents which match any field, but
- uses the `_score` from the best field. See <<type-best-fields>>.
- `most_fields`:: Finds documents which match any field and combines
- the `_score` from each field. See <<type-most-fields>>.
- `cross_fields`:: Treats fields with the same `analyzer` as though they
- were one big field. Looks for each word in *any*
- field. See <<type-cross-fields>>.
- `phrase`:: Runs a `match_phrase` query on each field and combines
- the `_score` from each field. See <<type-phrase>>.
- `phrase_prefix`:: Runs a `match_phrase_prefix` query on each field and
- combines the `_score` from each field. See <<type-phrase>>.
- [[type-best-fields]]
- ==== `best_fields`
- The `best_fields` type is most useful when you are searching for multiple
- words best found in the same field. For instance ``brown fox'' in a single
- field is more meaningful than ``brown'' in one field and ``fox'' in the other.
- The `best_fields` type generates a <<query-dsl-match-query,`match` query>> for
- each field and wraps them in a <<query-dsl-dis-max-query,`dis_max`>> query, to
- find the single best matching field. For instance, this query:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "multi_match" : {
- "query": "brown fox",
- "type": "best_fields",
- "fields": [ "subject", "message" ],
- "tie_breaker": 0.3
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- would be executed as:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "dis_max": {
- "queries": [
- { "match": { "subject": "brown fox" }},
- { "match": { "message": "brown fox" }}
- ],
- "tie_breaker": 0.3
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- Normally the `best_fields` type uses the score of the *single* best matching
- field, but if `tie_breaker` is specified, then it calculates the score as
- follows:
- * the score from the best matching field
- * plus `tie_breaker * _score` for all other matching fields
- Also, accepts `analyzer`, `boost`, `operator`, `minimum_should_match`,
- `fuzziness`, `prefix_length`, `max_expansions`, `rewrite`, `zero_terms_query`
- and `cutoff_frequency`, as explained in <<query-dsl-match-query, match query>>.
- [IMPORTANT]
- [[operator-min]]
- .`operator` and `minimum_should_match`
- ===================================================
- The `best_fields` and `most_fields` types are _field-centric_ -- they generate
- a `match` query *per field*. This means that the `operator` and
- `minimum_should_match` parameters are applied to each field individually,
- which is probably not what you want.
- Take this query for example:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "multi_match" : {
- "query": "Will Smith",
- "type": "best_fields",
- "fields": [ "first_name", "last_name" ],
- "operator": "and" <1>
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <1> All terms must be present.
- This query is executed as:
- (+first_name:will +first_name:smith)
- | (+last_name:will +last_name:smith)
- In other words, *all terms* must be present *in a single field* for a document
- to match.
- See <<type-cross-fields>> for a better solution.
- ===================================================
- [[type-most-fields]]
- ==== `most_fields`
- The `most_fields` type is most useful when querying multiple fields that
- contain the same text analyzed in different ways. For instance, the main
- field may contain synonyms, stemming and terms without diacritics. A second
- field may contain the original terms, and a third field might contain
- shingles. By combining scores from all three fields we can match as many
- documents as possible with the main field, but use the second and third fields
- to push the most similar results to the top of the list.
- This query:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "multi_match" : {
- "query": "quick brown fox",
- "type": "most_fields",
- "fields": [ "title", "title.original", "title.shingles" ]
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- would be executed as:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "bool": {
- "should": [
- { "match": { "title": "quick brown fox" }},
- { "match": { "title.original": "quick brown fox" }},
- { "match": { "title.shingles": "quick brown fox" }}
- ]
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- The score from each `match` clause is added together, then divided by the
- number of `match` clauses.
- Also, accepts `analyzer`, `boost`, `operator`, `minimum_should_match`,
- `fuzziness`, `prefix_length`, `max_expansions`, `rewrite`, `zero_terms_query`
- and `cutoff_frequency`, as explained in <<query-dsl-match-query,match query>>, but
- *see <<operator-min>>*.
- [[type-phrase]]
- ==== `phrase` and `phrase_prefix`
- The `phrase` and `phrase_prefix` types behave just like <<type-best-fields>>,
- but they use a `match_phrase` or `match_phrase_prefix` query instead of a
- `match` query.
- This query:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "multi_match" : {
- "query": "quick brown f",
- "type": "phrase_prefix",
- "fields": [ "subject", "message" ]
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- would be executed as:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "dis_max": {
- "queries": [
- { "match_phrase_prefix": { "subject": "quick brown f" }},
- { "match_phrase_prefix": { "message": "quick brown f" }}
- ]
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- Also, accepts `analyzer`, `boost`, `slop` and `zero_terms_query` as explained
- in <<query-dsl-match-query>>. Type `phrase_prefix` additionally accepts
- `max_expansions`.
- [IMPORTANT]
- [[phrase-fuzziness]]
- .`phrase`, `phrase_prefix` and `fuzziness`
- ===================================================
- The `fuzziness` parameter cannot be used with the `phrase` or `phrase_prefix` type.
- ===================================================
- [[type-cross-fields]]
- ==== `cross_fields`
- The `cross_fields` type is particularly useful with structured documents where
- multiple fields *should* match. For instance, when querying the `first_name`
- and `last_name` fields for ``Will Smith'', the best match is likely to have
- ``Will'' in one field and ``Smith'' in the other.
- ****
- This sounds like a job for <<type-most-fields>> but there are two problems
- with that approach. The first problem is that `operator` and
- `minimum_should_match` are applied per-field, instead of per-term (see
- <<operator-min,explanation above>>).
- The second problem is to do with relevance: the different term frequencies in
- the `first_name` and `last_name` fields can produce unexpected results.
- For instance, imagine we have two people: ``Will Smith'' and ``Smith Jones''.
- ``Smith'' as a last name is very common (and so is of low importance) but
- ``Smith'' as a first name is very uncommon (and so is of great importance).
- If we do a search for ``Will Smith'', the ``Smith Jones'' document will
- probably appear above the better matching ``Will Smith'' because the score of
- `first_name:smith` has trumped the combined scores of `first_name:will` plus
- `last_name:smith`.
- ****
- One way of dealing with these types of queries is simply to index the
- `first_name` and `last_name` fields into a single `full_name` field. Of
- course, this can only be done at index time.
- The `cross_field` type tries to solve these problems at query time by taking a
- _term-centric_ approach. It first analyzes the query string into individual
- terms, then looks for each term in any of the fields, as though they were one
- big field.
- A query like:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "multi_match" : {
- "query": "Will Smith",
- "type": "cross_fields",
- "fields": [ "first_name", "last_name" ],
- "operator": "and"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- is executed as:
- +(first_name:will last_name:will)
- +(first_name:smith last_name:smith)
- In other words, *all terms* must be present *in at least one field* for a
- document to match. (Compare this to
- <<operator-min,the logic used for `best_fields` and `most_fields`>>.)
- That solves one of the two problems. The problem of differing term frequencies
- is solved by _blending_ the term frequencies for all fields in order to even
- out the differences.
- In practice, `first_name:smith` will be treated as though it has the same
- frequencies as `last_name:smith`, plus one. This will make matches on
- `first_name` and `last_name` have comparable scores, with a tiny advantage
- for `last_name` since it is the most likely field that contains `smith`.
- Note that `cross_fields` is usually only useful on short string fields
- that all have a `boost` of `1`. Otherwise boosts, term freqs and length
- normalization contribute to the score in such a way that the blending of term
- statistics is not meaningful anymore.
- If you run the above query through the <<search-validate>>, it returns this
- explanation:
- +blended("will", fields: [first_name, last_name])
- +blended("smith", fields: [first_name, last_name])
- Also, accepts `analyzer`, `boost`, `operator`, `minimum_should_match`,
- `zero_terms_query` and `cutoff_frequency`, as explained in
- <<query-dsl-match-query, match query>>.
- ===== `cross_field` and analysis
- The `cross_field` type can only work in term-centric mode on fields that have
- the same analyzer. Fields with the same analyzer are grouped together as in
- the example above. If there are multiple groups, they are combined with a
- `bool` query.
- For instance, if we have a `first` and `last` field which have
- the same analyzer, plus a `first.edge` and `last.edge` which
- both use an `edge_ngram` analyzer, this query:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "multi_match" : {
- "query": "Jon",
- "type": "cross_fields",
- "fields": [
- "first", "first.edge",
- "last", "last.edge"
- ]
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- would be executed as:
- blended("jon", fields: [first, last])
- | (
- blended("j", fields: [first.edge, last.edge])
- blended("jo", fields: [first.edge, last.edge])
- blended("jon", fields: [first.edge, last.edge])
- )
- In other words, `first` and `last` would be grouped together and
- treated as a single field, and `first.edge` and `last.edge` would be
- grouped together and treated as a single field.
- Having multiple groups is fine, but when combined with `operator` or
- `minimum_should_match`, it can suffer from the <<operator-min,same problem>>
- as `most_fields` or `best_fields`.
- You can easily rewrite this query yourself as two separate `cross_fields`
- queries combined with a `bool` query, and apply the `minimum_should_match`
- parameter to just one of them:
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "bool": {
- "should": [
- {
- "multi_match" : {
- "query": "Will Smith",
- "type": "cross_fields",
- "fields": [ "first", "last" ],
- "minimum_should_match": "50%" <1>
- }
- },
- {
- "multi_match" : {
- "query": "Will Smith",
- "type": "cross_fields",
- "fields": [ "*.edge" ]
- }
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <1> Either `will` or `smith` must be present in either of the `first`
- or `last` fields
- You can force all fields into the same group by specifying the `analyzer`
- parameter in the query.
- [source,js]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "multi_match" : {
- "query": "Jon",
- "type": "cross_fields",
- "analyzer": "standard", <1>
- "fields": [ "first", "last", "*.edge" ]
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- <1> Use the `standard` analyzer for all fields.
- which will be executed as:
- blended("will", fields: [first, first.edge, last.edge, last])
- blended("smith", fields: [first, first.edge, last.edge, last])
- ===== `tie_breaker`
- By default, each per-term `blended` query will use the best score returned by
- any field in a group, then these scores are added together to give the final
- score. The `tie_breaker` parameter can change the default behaviour of the
- per-term `blended` queries. It accepts:
- [horizontal]
- `0.0`:: Take the single best score out of (eg) `first_name:will`
- and `last_name:will` (*default*)
- `1.0`:: Add together the scores for (eg) `first_name:will` and
- `last_name:will`
- `0.0 < n < 1.0`:: Take the single best score plus +tie_breaker+ multiplied
- by each of the scores from other matching fields.
- [IMPORTANT]
- [[crossfields-fuzziness]]
- .`cross_fields` and `fuzziness`
- ===================================================
- The `fuzziness` parameter cannot be used with the `cross_fields` type.
- ===================================================
|