combined-fields-query.asciidoc 6.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192
  1. [[query-dsl-combined-fields-query]]
  2. === Combined fields
  3. ++++
  4. <titleabbrev>Combined fields</titleabbrev>
  5. ++++
  6. The `combined_fields` query supports searching multiple text fields as if their
  7. contents had been indexed into one combined field. The query takes a term-centric
  8. view of the input string: first it analyzes the query string into individual terms,
  9. then looks for each term in any of the fields. This query is particularly
  10. useful when a match could span multiple text fields, for example the `title`,
  11. `abstract`, and `body` of an article:
  12. [source,console]
  13. ----
  14. GET /_search
  15. {
  16. "query": {
  17. "combined_fields" : {
  18. "query": "database systems",
  19. "fields": [ "title", "abstract", "body"],
  20. "operator": "and"
  21. }
  22. }
  23. }
  24. ----
  25. The `combined_fields` query takes a principled approach to scoring based on the
  26. simple BM25F formula described in
  27. http://www.staff.city.ac.uk/~sb317/papers/foundations_bm25_review.pdf[The Probabilistic Relevance Framework: BM25 and Beyond].
  28. When scoring matches, the query combines term and collection statistics across
  29. fields to score each match as if the specified fields had been indexed into a
  30. single, combined field. This scoring is a best attempt; `combined_fields` makes
  31. some approximations and scores will not obey the BM25F model perfectly.
  32. :query-type: combined fields
  33. // tag::max-clause-limit[]
  34. [WARNING]
  35. .Field number limit
  36. ===================================================
  37. By default, there is a limit to the number of clauses a query can contain. This
  38. limit is defined by the
  39. <<indices-query-bool-max-clause-count,`indices.query.bool.max_clause_count`>>
  40. setting, which defaults to `4096`. For {query-type} queries, the number of
  41. clauses is calculated as the number of fields multiplied by the number of terms.
  42. ===================================================
  43. // end::max-clause-limit[]
  44. :query-type!:
  45. ==== Per-field boosting
  46. Field boosts are interpreted according to the combined field model. For example,
  47. if the `title` field has a boost of 2, the score is calculated as if each term
  48. in the title appeared twice in the synthetic combined field.
  49. [source,console]
  50. ----
  51. GET /_search
  52. {
  53. "query": {
  54. "combined_fields" : {
  55. "query" : "distributed consensus",
  56. "fields" : [ "title^2", "body" ] <1>
  57. }
  58. }
  59. }
  60. ----
  61. <1> Individual fields can be boosted with the caret (`^`) notation.
  62. NOTE: The `combined_fields` query requires that field boosts are greater than
  63. or equal to 1.0. Field boosts are allowed to be fractional.
  64. [[combined-field-top-level-params]]
  65. ==== Top-level parameters for `combined_fields`
  66. `fields`::
  67. (Required, array of strings) List of fields to search. Field wildcard patterns
  68. are allowed. Only <<text,`text`>> fields are supported, and they must all have
  69. the same search <<analyzer,`analyzer`>>.
  70. `query`::
  71. +
  72. --
  73. (Required, string) Text to search for in the provided `<fields>`.
  74. The `combined_fields` query <<analysis,analyzes>> the provided text before
  75. performing a search.
  76. --
  77. `auto_generate_synonyms_phrase_query`::
  78. +
  79. --
  80. (Optional, Boolean) If `true`, <<query-dsl-match-query-phrase,match phrase>>
  81. queries are automatically created for multi-term synonyms. Defaults to `true`.
  82. See <<query-dsl-match-query-synonyms,Use synonyms with match query>> for an
  83. example.
  84. --
  85. `operator`::
  86. +
  87. --
  88. (Optional, string) Boolean logic used to interpret text in the `query` value.
  89. Valid values are:
  90. `or` (Default)::
  91. For example, a `query` value of `database systems` is interpreted as `database
  92. OR systems`.
  93. `and`::
  94. For example, a `query` value of `database systems` is interpreted as `database
  95. AND systems`.
  96. --
  97. `minimum_should_match`::
  98. +
  99. --
  100. (Optional, string) Minimum number of clauses that must match for a document to
  101. be returned. See the <<query-dsl-minimum-should-match, `minimum_should_match`
  102. parameter>> for valid values and more information.
  103. --
  104. `zero_terms_query`::
  105. +
  106. --
  107. (Optional, string) Indicates whether no documents are returned if the `analyzer`
  108. removes all tokens, such as when using a `stop` filter. Valid values are:
  109. `none` (Default)::
  110. No documents are returned if the `analyzer` removes all tokens.
  111. `all`::
  112. Returns all documents, similar to a <<query-dsl-match-all-query,`match_all`>>
  113. query.
  114. See <<query-dsl-match-query-zero>> for an example.
  115. --
  116. ===== Comparison to `multi_match` query
  117. The `combined_fields` query provides a principled way of matching and scoring
  118. across multiple <<text, `text`>> fields. To support this, it requires that all
  119. fields have the same search <<analyzer,`analyzer`>>.
  120. If you want a single query that handles fields of different types like
  121. keywords or numbers, then the <<query-dsl-multi-match-query,`multi_match`>>
  122. query may be a better fit. It supports both text and non-text fields, and
  123. accepts text fields that do not share the same analyzer.
  124. The main `multi_match` modes `best_fields` and `most_fields` take a
  125. field-centric view of the query. In contrast, `combined_fields` is
  126. term-centric: `operator` and `minimum_should_match` are applied per-term,
  127. instead of per-field. Concretely, a query like
  128. [source,console]
  129. ----
  130. GET /_search
  131. {
  132. "query": {
  133. "combined_fields" : {
  134. "query": "database systems",
  135. "fields": [ "title", "abstract"],
  136. "operator": "and"
  137. }
  138. }
  139. }
  140. ----
  141. is executed as:
  142. [source,txt]
  143. ----
  144. +(combined("database", fields:["title" "abstract"]))
  145. +(combined("systems", fields:["title", "abstract"]))
  146. ----
  147. In other words, each term must be present in at least one field for a
  148. document to match.
  149. The `cross_fields` `multi_match` mode also takes a term-centric approach and
  150. applies `operator` and `minimum_should_match per-term`. The main advantage of
  151. `combined_fields` over `cross_fields` is its robust and interpretable approach
  152. to scoring based on the BM25F algorithm.
  153. [NOTE]
  154. .Custom similarities
  155. ===================================================
  156. The `combined_fields` query currently only supports the BM25 similarity,
  157. which is the default unless a <<index-modules-similarity, custom similarity>>
  158. is configured. <<similarity, Per-field similarities>> are also not allowed.
  159. Using `combined_fields` in either of these cases will result in an error.
  160. ===================================================