123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185 |
- [[query-dsl-combined-fields-query]]
- === Combined fields
- ++++
- <titleabbrev>Combined fields</titleabbrev>
- ++++
- The `combined_fields` query supports searching multiple text fields as if their
- contents had been indexed into one combined field. It takes a term-centric
- view of the query: first it analyzes the query string into individual terms,
- then looks for each term in any of the fields. This query is particularly
- useful when a match could span multiple text fields, for example the `title`,
- `abstract` and `body` of an article:
- [source,console]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "combined_fields" : {
- "query": "database systems",
- "fields": [ "title", "abstract", "body"],
- "operator": "and"
- }
- }
- }
- --------------------------------------------------
- The `combined_fields` query takes a principled approach to scoring based on the
- simple BM25F formula described in
- http://www.staff.city.ac.uk/~sb317/papers/foundations_bm25_review.pdf[The Probabilistic Relevance Framework: BM25 and Beyond].
- When scoring matches, the query combines term and collection statistics across
- fields. This allows it to score each match as if the specified fields had been
- indexed into a single combined field. (Note that this is a best attempt --
- `combined_fields` makes some approximations and scores will not obey this
- model perfectly.)
- [WARNING]
- .Field number limit
- ===================================================
- There is a limit on the number of fields times terms that can be queried at
- once. It is defined by the `indices.query.bool.max_clause_count`
- <<search-settings>> which defaults to 4096.
- ===================================================
- ==== Per-field boosting
- Individual fields can be boosted with the caret (`^`) notation:
- [source,console]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "combined_fields" : {
- "query" : "distributed consensus",
- "fields" : [ "title^2", "body" ] <1>
- }
- }
- }
- --------------------------------------------------
- Field boosts are interpreted according to the combined field model. For example,
- if the `title` field has a boost of 2, the score is calculated as if each term
- in the title appeared twice in the synthetic combined field.
- NOTE: The `combined_fields` query requires that field boosts are greater than
- or equal to 1.0. Field boosts are allowed to be fractional.
- [[combined-field-top-level-params]]
- ==== Top-level parameters for `combined_fields`
- `fields`::
- (Required, array of strings) List of fields to search. Field wildcard patterns
- are allowed. Only <<text,`text`>> fields are supported, and they must all have
- the same search <<analyzer,`analyzer`>>.
- `query`::
- +
- --
- (Required, string) Text to search for in the provided `<fields>`.
- The `combined_fields` query <<analysis,analyzes>> the provided text before
- performing a search.
- --
- `auto_generate_synonyms_phrase_query`::
- +
- --
- (Optional, Boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
- queries are automatically created for multi-term synonyms. Defaults to `true`.
- See <<query-dsl-match-query-synonyms,Use synonyms with match query>> for an
- example.
- --
- `operator`::
- +
- --
- (Optional, string) Boolean logic used to interpret text in the `query` value.
- Valid values are:
- `or` (Default)::
- For example, a `query` value of `database systems` is interpreted as `database
- OR systems`.
- `and`::
- For example, a `query` value of `database systems` is interpreted as `database
- AND systems`.
- --
- `minimum_should_match`::
- +
- --
- (Optional, string) Minimum number of clauses that must match for a document to
- be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
- parameter>> for valid values and more information.
- --
- `zero_terms_query`::
- +
- --
- (Optional, string) Indicates whether no documents are returned if the `analyzer`
- removes all tokens, such as when using a `stop` filter. Valid values are:
- `none` (Default)::
- No documents are returned if the `analyzer` removes all tokens.
- `all`::
- Returns all documents, similar to a <<query-dsl-match-all-query,`match_all`>>
- query.
- See <<query-dsl-match-query-zero>> for an example.
- --
- ===== Comparison to `multi_match` query
- The `combined_fields` query provides a principled way of matching and scoring
- across multiple <<text, `text`>> fields. To support this, it requires that all
- fields have the same search <<analyzer,`analyzer`>>.
- If you want a single query that handles fields of different types like
- keywords or numbers, then the <<query-dsl-multi-match-query,`multi_match`>>
- query may be a better fit. It supports both text and non-text fields, and
- accepts text fields that do not share the same analyzer.
- The main `multi_match` modes `best_fields` and `most_fields` take a
- field-centric view of the query. In contrast, `combined_fields` is
- term-centric: `operator` and `minimum_should_match` are applied per-term,
- instead of per-field. Concretely, a query like
- [source,console]
- --------------------------------------------------
- GET /_search
- {
- "query": {
- "combined_fields" : {
- "query": "database systems",
- "fields": [ "title", "abstract"],
- "operator": "and"
- }
- }
- }
- --------------------------------------------------
- is executed as
- +(combined("database", fields:["title" "abstract"]))
- +(combined("systems", fields:["title", "abstract"]))
- In other words, each term must be present in at least one field for a
- document to match.
- The `cross_fields` `multi_match` mode also takes a term-centric approach and
- applies `operator` and `minimum_should_match per-term`. The main advantage of
- `combined_fields` over `cross_fields` is its robust and interpretable approach
- to scoring based on the BM25F algorithm.
- [NOTE]
- .Custom similarities
- ===================================================
- The `combined_fields` query currently only supports the `BM25` similarity
- (which is the default unless a <<index-modules-similarity, custom similarity>>
- is configured). <<similarity, Per-field similarities>> are also not allowed.
- Using `combined_fields` in either of these cases will result in an error.
- ===================================================
|