Browse Source

[DOCS] Prune `Search your data` content (#61303)

Changes:
* Removes narrative around URI searches. These aren't commonly used in production. The `q` param is already covered in the search API docs: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-search.html#search-api-query-params-q
* Adds a common options section that highlights narrative docs for query DSL, aggregations, multi-index search, search fields, pagination, sorting, and async search.
* Adds a `Search shard routing` page. Moves narrative docs for adaptive replica selection, preference, routing , and shard limits to that section.
* Moves search timeout and cancellation content to the `Search your data` page.
* Creates a `Search multiple data streams and indices` page. Moves related narrative docs for multi-target syntax searches and `indices_boost` to that page.
* Removes narrative examples for the `search_type` parameters. Moves documentation for this parameter to the search API docs.
James Rodewig 5 years ago
parent
commit
d46931840b

+ 1 - 1
docs/java-rest/high-level/document/multi-get.asciidoc

@@ -65,7 +65,7 @@ include-tagged::{doc-tests-file}[{api}-request-item-extras]
 <2> Version
 <2> Version
 <3> Version type
 <3> Version type
 
 
-{ref}/search-your-data.html#search-preference[`preference`],
+{ref}/search-search.html#search-preference[`preference`],
 {ref}/docs-get.html#realtime[`realtime`]
 {ref}/docs-get.html#realtime[`realtime`]
 and
 and
 {ref}/docs-get.html#get-refresh[`refresh`] can be set on the main request but
 {ref}/docs-get.html#get-refresh[`refresh`] can be set on the main request but

+ 33 - 138
docs/reference/search.asciidoc

@@ -1,156 +1,51 @@
 [[search]]
 [[search]]
 == Search APIs
 == Search APIs
 
 
+Search APIs are used to search and aggregate data stored in {es} indices and
+data streams. For an overview and related tutorials, see <<search-your-data>>.
+
 Most search APIs support <<multi-index,multi-target syntax>>, with the
 Most search APIs support <<multi-index,multi-target syntax>>, with the
-exception of the <<search-explain>> endpoints.
+exception of the <<search-explain,explain API>>.
 
 
 [discrete]
 [discrete]
-[[search-routing]]
-=== Routing
-
-When executing a search, Elasticsearch will pick the "best" copy of the data
-based on the <<search-adaptive-replica,adaptive replica selection>> formula.
-Which shards will be searched on can also be controlled by providing the
-`routing` parameter.
-
-For example, the following indexing request routes documents to shard `1`:
-
-[source,console]
---------------------------------------------------
-POST /my-index-000001/_doc?routing=1
-{
-  "@timestamp": "2099-11-15T13:12:00",
-  "message": "GET /search HTTP/1.1 200 1070000",
-  "user": {
-    "id": "kimchy"
-  }
-}
---------------------------------------------------
-
-Later, you can use the `routing` parameter in a search request to search only
-the specified shard. The following search requests hits only shard `1`.
-
-[source,console]
---------------------------------------------------
-POST /my-index-000001/_search?routing=1
-{
-  "query": {
-    "bool": {
-      "must": {
-        "query_string": {
-          "query": "some query string here"
-        }
-      },
-      "filter": {
-        "term": { "user.id": "kimchy" }
-      }
-    }
-  }
-}
---------------------------------------------------
-// TEST[continued]
-
-The routing parameter can be multi valued represented as a comma
-separated string. This will result in hitting the relevant shards where
-the routing values match to.
+[[core-search-apis]]
+=== Core search
 
 
-[discrete]
-[[search-adaptive-replica]]
-=== Adaptive Replica Selection
-
-By default, Elasticsearch will use what is called adaptive replica selection.
-This allows the coordinating node to send the request to the copy deemed "best"
-based on a number of criteria:
-
-- Response time of past requests between the coordinating node and the node
-  containing the copy of the data
-- Time past search requests took to execute on the node containing the data
-- The queue size of the search threadpool on the node containing the data
-
-This can be turned off by changing the dynamic cluster setting
-`cluster.routing.use_adaptive_replica_selection` from `true` to `false`:
-
-[source,console]
---------------------------------------------------
-PUT /_cluster/settings
-{
-  "transient": {
-    "cluster.routing.use_adaptive_replica_selection": false
-  }
-}
---------------------------------------------------
-
-If adaptive replica selection is turned off, searches are sent to the
-index/indices shards in a round robin fashion between all copies of the data
-(primaries and replicas).
+* <<search-search>>
+* <<search-multi-search>>
+* <<async-search>>
+* <<scroll-api>>
+* <<clear-scroll-api>>
+* <<search-suggesters>>
 
 
 [discrete]
 [discrete]
-[[stats-groups]]
-=== Stats Groups
-
-A search can be associated with stats groups, which maintains a
-statistics aggregation per group. It can later be retrieved using the
-<<indices-stats,indices stats>> API
-specifically. For example, here is a search body request that associate
-the request with two different groups:
-
-[source,console]
---------------------------------------------------
-POST /_search
-{
-  "query" : {
-    "match_all" : {}
-  },
-  "stats" : ["group1", "group2"]
-}
---------------------------------------------------
-// TEST[setup:my_index]
+[[search-testing-apis]]
+=== Search testing
 
 
-[discrete]
-[[global-search-timeout]]
-=== Global Search Timeout
-
-Individual searches can have a timeout as part of the
-<<search-request-body>>. Since search requests can originate from many
-sources, Elasticsearch has a dynamic cluster-level setting for a global
-search timeout that applies to all search requests that do not set a
-timeout in the request body. These requests will be cancelled after
-the specified time using the mechanism described in the following section on
-<<global-search-cancellation>>. Therefore the same caveats about timeout
-responsiveness apply.
-
-The setting key is `search.default_search_timeout` and can be set using the
-<<cluster-update-settings>> endpoints. The default value is no global timeout.
-Setting this value to `-1` resets the global search timeout to no timeout.
+* <<search-explain>>
+* <<search-field-caps>>
+* <<search-profile>>
+* <<search-rank-eval>>
+* <<search-shards>>
+* <<search-validate>>
 
 
 [discrete]
 [discrete]
-[[global-search-cancellation]]
-=== Search Cancellation
+[[search-template-apis]]
+=== Search templates
 
 
-Searches can be cancelled using standard <<task-cancellation,task cancellation>>
-mechanism and are also automatically cancelled when the http connection used to
-perform the request is closed by the client. It is fundamental that the http
-client sending requests closes connections whenever requests time out or are
-aborted.
+* <<search-template>>
+* <<multi-search-template>>
 
 
 [discrete]
 [discrete]
-[[search-concurrency-and-parallelism]]
-=== Search concurrency and parallelism
-
-By default Elasticsearch doesn't reject any search requests based on the number
-of shards the request hits. While Elasticsearch will optimize the search
-execution on the coordinating node a large number of shards can have a
-significant impact CPU and memory wise. It is usually a better idea to organize
-data in such a way that there are fewer larger shards. In case you would like to
-configure a soft limit, you can update the `action.search.shard_count.limit`
-cluster setting in order to reject search requests that hit too many shards.
-
-The request parameter `max_concurrent_shard_requests` can be used to control the
-maximum number of concurrent shard requests the search API will execute per node
-for the request. This parameter should be used to protect a single request from
-overloading a cluster (e.g., a default request will hit all indices in a cluster
-which could cause shard request rejections if the number of shards per node is
-high). This default value is `5`.
+[[eql-search-apis]]
+=== EQL search
+
+For an overview of EQL and related tutorials, see <<eql>>.
+
+* <<eql-search-api>>
+* <<get-async-eql-search-api>>
+* <<delete-async-eql-search-api>>
+
 
 
 include::search/search.asciidoc[]
 include::search/search.asciidoc[]
 
 

+ 0 - 36
docs/reference/search/request/index-boost.asciidoc

@@ -1,36 +0,0 @@
-[discrete]
-[[index-boost]]
-=== Index boost
-
-When searching multiple indices, you can use the `indices_boost` parameter to
-boost results from one or more specified indices. This is useful when hits
-coming from one index matter more than hits coming from another index.
-
-[source,console]
---------------------------------------------------
-GET /_search
-{
-  "indices_boost": [
-    { "my-index-000001": 1.4 },
-    { "my-index-000002": 1.3 }
-  ]
-}
---------------------------------------------------
-// TEST[s/^/PUT my-index-000001\nPUT my-index-000002\n/]
-
-Index aliases and wildcard expressions can also be used:
-
-[source,console]
---------------------------------------------------
-GET /_search
-{
-  "indices_boost": [
-    { "my-alias":  1.4 },
-    { "my-index*": 1.3 }
-  ]
-}
---------------------------------------------------
-// TEST[s/^/PUT my-index-000001\nPUT my-index-000001\/_alias\/my-alias\n/]
-
-If multiple matches are found, the first match will be used. For example, if an
-index is included in both `alias1` and `index*`, boost value of `1.4` is applied.

+ 0 - 80
docs/reference/search/request/preference.asciidoc

@@ -1,80 +0,0 @@
-[discrete]
-[[search-preference]]
-=== Preference
-
-You can use the `preference` parameter to control the shard copies on which a search runs. By
-default, Elasticsearch selects from the available shard copies in an
-unspecified order, taking the <<shard-allocation-awareness,allocation awareness>> and
-<<search-adaptive-replica,adaptive replica selection>> configuration into
-account. However, it may sometimes be desirable to try and route certain
-searches to certain sets of shard copies.
-
-A possible use case would be to make use of per-copy caches like the
-<<shard-request-cache,request cache>>. Doing this, however, runs contrary to the
-idea of search parallelization and can create hotspots on certain nodes because
-the load might not be evenly distributed anymore.
-
-The `preference` is a query string parameter which can be set to:
-
-[horizontal]
-`_only_local`::
-	The operation will be executed only on shards allocated to the local
-	node.
-
-`_local`::
-	The operation will be executed on shards allocated to the local node if
-	possible, and will fall back to other shards if not.
-
-`_prefer_nodes:abc,xyz`::
-	The operation will be executed on nodes with one of the provided node
-	ids (`abc` or `xyz` in this case) if possible. If suitable shard copies
-	exist on more than one of the selected nodes then the order of
-	preference between these copies is unspecified.
-
-`_shards:2,3`::
-	Restricts the operation to the specified shards. (`2` and `3` in this
-	case).  This preference can be combined with other preferences but it
-	has to appear first: `_shards:2,3|_local`
-
-`_only_nodes:abc*,x*yz,...`::
-	Restricts the operation to nodes specified according to the
-	<<cluster,node specification>>. If suitable shard copies exist on more
-	than one of the selected nodes then the order of preference between
-	these copies is unspecified.
-
-Custom (string) value::
-	Any value that does not start with `_`. If two searches both give the same
-	custom string value for their preference and the underlying cluster state
-	does not change then the same ordering of shards will be used for the
-	searches. This does not guarantee that the exact same shards will be used
-	each time: the cluster state, and therefore the selected shards, may change
-	for a number of reasons including shard relocations and shard failures, and
-	nodes may sometimes reject searches causing fallbacks to alternative nodes.
-	However, in practice the ordering of shards tends to remain stable for long
-	periods of time. A good candidate for a custom preference value is something
-	like the web session id or the user name.
-
-For instance, use the user's session ID `xyzabc123` as follows:
-
-[source,console]
-------------------------------------------------
-GET /_search?preference=xyzabc123
-{
-  "query": {
-    "match": {
-      "title": "elasticsearch"
-    }
-  }
-}
-------------------------------------------------
-
-This can be an effective strategy to increase usage of e.g. the request cache for
-unique users running similar searches repeatedly by always hitting the same cache, while
-requests of different users are still spread across all shard copies.
-
-NOTE: The `_only_local` preference guarantees only to use shard copies on the
-local node, which is sometimes useful for troubleshooting. All other options do
-not _fully_ guarantee that any particular shard copies are used in a search,
-and on a changing index this may mean that repeated searches may yield
-different results if they are executed on different shard copies which are in
-different refresh states.

+ 0 - 78
docs/reference/search/request/search-type.asciidoc

@@ -1,78 +0,0 @@
-[discrete]
-[[search-type]]
-=== Search type
-
-There are different execution paths that can be done when executing a
-distributed search. The distributed search operation needs to be
-scattered to all the relevant shards and then all the results are
-gathered back. When doing scatter/gather type execution, there are
-several ways to do that, specifically with search engines.
-
-One of the questions when executing a distributed search is how many
-results to retrieve from each shard. For example, if we have 10 shards,
-the 1st shard might hold the most relevant results from 0 till 10, with
-other shards results ranking below it. For this reason, when executing a
-request, we will need to get results from 0 till 10 from all shards,
-sort them, and then return the results if we want to ensure correct
-results.
-
-Another question, which relates to the search engine, is the fact that each
-shard stands on its own. When a query is executed on a specific shard,
-it does not take into account term frequencies and other search engine
-information from the other shards. If we want to support accurate
-ranking, we would need to first gather the term frequencies from all
-shards to calculate global term frequencies, then execute the query on
-each shard using these global frequencies.
-
-Also, because of the need to sort the results, getting back a large
-document set, or even scrolling it, while maintaining the correct sorting
-behavior can be a very expensive operation. For large result set
-scrolling, it is best to sort by `_doc` if the order in which documents
-are returned is not important.
-
-Elasticsearch is very flexible and allows to control the type of search
-to execute on a *per search request* basis. The type can be configured
-by setting the *search_type* parameter in the query string. The types
-are:
-
-[discrete]
-[[query-then-fetch]]
-==== Query Then Fetch
-
-Parameter value: *query_then_fetch*.
-
-The request is processed in two phases. In the first phase, the query
-is forwarded to *all involved shards*. Each shard executes the search
-and generates a sorted list of results, local to that shard. Each
-shard returns *just enough information* to the coordinating node
-to allow it to merge and re-sort the shard level results into a globally
-sorted set of results, of maximum length `size`. 
-
-During the second phase, the coordinating node requests the document
-content (and highlighted snippets, if any) from *only the relevant
-shards*.
-
-[source,console]
---------------------------------------------------
-GET my-index-000001/_search?search_type=query_then_fetch
---------------------------------------------------
-// TEST[setup:my_index]
-
-NOTE: This is the default setting, if you do not specify a `search_type`
-      in your request.
-
-[discrete]
-[[dfs-query-then-fetch]]
-==== Dfs, Query Then Fetch
-
-Parameter value: *dfs_query_then_fetch*.
-
-Same as "Query Then Fetch", except for an initial scatter phase which
-goes and computes the distributed term frequencies for more accurate
-scoring.
-
-[source,console]
---------------------------------------------------
-GET my-index-000001/_search?search_type=dfs_query_then_fetch
---------------------------------------------------
-// TEST[setup:my_index]

+ 117 - 0
docs/reference/search/search-multiple-indices.asciidoc

@@ -0,0 +1,117 @@
+[[search-multiple-indices]]
+== Search multiple data streams and indices
+
+To search multiple data streams and indices, add them as comma-separated values
+in the <<search-search,search API>>'s request path.
+
+The following request searches the `my-index-000001` and `my-index-000002`
+indices.
+
+[source,console]
+----
+GET /my-index-000001,my-index-000002/_search
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+// TEST[setup:my_index]
+// TEST[s/^/PUT my-index-000002\n/]
+
+You can also search multiple data streams and indices using an index pattern.
+
+The following request targets the `my-index-*` index pattern. The request
+searches any data streams or indices in the cluster that start with `my-index-`.
+
+[source,console]
+----
+GET /my-index-*/_search
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+// TEST[setup:my_index]
+
+To search all data streams and indices in a cluster, omit the target from the
+request path. Alternatively, you can use `_all` or `*`.
+
+The following requests are equivalent and search all data streams and indices in
+the cluster.
+
+[source,console]
+----
+GET /_search
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+
+GET /_all/_search
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+
+GET /*/_search
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+// TEST[setup:my_index]
+
+[discrete]
+[[index-boost]]
+=== Index boost
+
+When searching multiple indices, you can use the `indices_boost` parameter to
+boost results from one or more specified indices. This is useful when hits
+coming from some indices matter more than hits from other.
+
+NOTE: You cannot use `indices_boost` with data streams.
+
+[source,console]
+--------------------------------------------------
+GET /_search
+{
+  "indices_boost": [
+    { "my-index-000001": 1.4 },
+    { "my-index-000002": 1.3 }
+  ]
+}
+--------------------------------------------------
+// TEST[s/^/PUT my-index-000001\nPUT my-index-000002\n/]
+
+Index aliases and index patterns can also be used:
+
+[source,console]
+--------------------------------------------------
+GET /_search
+{
+  "indices_boost": [
+    { "my-alias":  1.4 },
+    { "my-index*": 1.3 }
+  ]
+}
+--------------------------------------------------
+// TEST[s/^/PUT my-index-000001\nPUT my-index-000001\/_alias\/my-alias\n/]
+
+If multiple matches are found, the first match will be used. For example, if an
+index is included in `alias1` and matches the `my-index*` pattern, a boost value
+of `1.4` is applied.

+ 184 - 0
docs/reference/search/search-shard-routing.asciidoc

@@ -0,0 +1,184 @@
+[[search-shard-routing]]
+== Search shard routing
+
+To protect against hardware failure and increase search capacity, {es} can store
+copies of an index's data across multiple shards on multiple nodes. When running
+a search request, {es} selects a node containing a copy of the index's data and
+forwards the search request to that node's shards. This process is known as
+_search shard routing_ or _routing_.
+
+[discrete]
+[[search-adaptive-replica]]
+=== Adaptive replica selection
+
+By default, {es} uses _adaptive replica selection_ to route search requests.
+This method selects an eligible node using <<allocation-awareness,allocation
+awareness>> and the following criteria:
+
+* Response time of prior requests between the coordinating node
+and the eligible node
+* How long the eligible node took to run previous searches
+* Queue size of the eligible node's `search` <<modules-threadpool,threadpool>>
+
+Adaptive replica selection is designed to decrease search latency. However, you
+can disable adaptive replica selection by setting
+`cluster.routing.use_adaptive_replica_selection` to `false` using the
+<<cluster-update-settings,cluster settings API>>. If disabled, {es} routes
+search requests using a round-robin method, which may result in slower searches.
+
+[discrete]
+[[shard-and-node-preference]]
+=== Set a preference
+
+By default, adaptive replica selection chooses from all eligible nodes and
+shards. However, you may only want data from a local node or want to route
+searches to a specific node based on its hardware. Or you may want to send
+repeated searches to the same shard to take advantage of caching.
+
+To limit the set of nodes and shards eligible for a search request, use
+the search API's <<search-preference,`preference`>> query parameter.
+
+For example, the following request searches `my-index-000001` with a
+`preference` of `_local`. This restricts the search to shards on the
+local node. If the local node contains no shard copies of the index's data, the
+request uses adaptive replica selection to another eligible node
+as a fallback.
+
+[source,console]
+----
+GET /my-index-000001/_search?preference=_local
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+// TEST[setup:my_index]
+
+You can also use the `preference` parameter to route searches to specific shards
+based on a provided string. If the cluster state and selected shards
+do not change, searches using the same `preference` string are routed to the
+same shards in the same order.
+
+We recommend using a unique `preference` string, such as a user name or web
+session ID. This string cannot start with a `_`.
+
+TIP: You can use this option to serve cached results for frequently used and
+resource-intensive searches. If the shard's data doesn't change, repeated
+searches with the same `preference` string retrieve results from the same
+<<shard-request-cache,shard request cache>>. For time-series use cases, such as
+logging, data in older indices is rarely updated and can be served directly from
+this cache.
+
+The following request searches `my-index-000001` with a `preference` string of
+`my-custom-shard-string`.
+
+[source,console]
+----
+GET /my-index-000001/_search?preference=my-custom-shard-string
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+// TEST[setup:my_index]
+
+NOTE: If the cluster state or selected shards change, the same `preference`
+string may not route searches to the same shards in the same order. This can
+occur for a number of reasons, including shard relocations and shard failures. A
+node can also reject a search request, which {es} would re-route to another
+node.
+
+[discrete]
+[[search-routing]]
+=== Use a routing value
+
+When you index a document, you can specify an optional
+<<mapping-routing-field,routing value>>, which routes the document to a
+specific shard.
+
+For example, the following indexing request routes a document using
+`my-routing-value`.
+
+[source,console]
+----
+POST /my-index-000001/_doc?routing=my-routing-value
+{
+  "@timestamp": "2099-11-15T13:12:00",
+  "message": "GET /search HTTP/1.1 200 1070000",
+  "user": {
+    "id": "kimchy"
+  }
+}
+----
+
+You can use the same routing value in the search API's `routing` query
+parameter. This ensures the search runs on the same shard used to index the
+document.
+
+[source,console]
+----
+GET /my-index-000001/_search?routing=my-routing-value
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+// TEST[setup:my_index]
+
+You can also provide multiple comma-separated routing values:
+
+[source,console]
+----
+GET /my-index-000001/_search?routing=my-routing-value,my-routing-value-2
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+// TEST[setup:my_index]
+
+[discrete]
+[[search-concurrency-and-parallelism]]
+=== Search concurrency and parallelism
+
+By default, {es} doesn't reject search requests based on the number of shards
+the request hits. However, hitting a large number of shards can significantly
+increase CPU and memory usage.
+
+TIP: For tips on preventing indices with large numbers of shards, see
+<<avoid-oversharding>>.
+
+You can use the `max_concurrent_shard_requests` query parameter to control
+maximum number of concurrent shards a search request can hit per node. This
+prevents a single request from overloading a cluster. The parameter defaults to
+a maximum of `5`.
+
+[source,console]
+----
+GET /my-index-000001/_search?max_concurrent_shard_requests=3
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
+----
+// TEST[setup:my_index]
+
+You can also use the `action.search.shard_count.limit` cluster setting to set a
+search shard limit and reject requests that hit too many shards. You can
+configure `action.search.shard_count.limit` using the
+<<cluster-update-settings,cluster settings API>>.

+ 104 - 128
docs/reference/search/search-your-data.asciidoc

@@ -2,7 +2,7 @@
 = Search your data
 = Search your data
 
 
 [[search-query]]
 [[search-query]]
-A _search query_, or _query_, is a request for information about data in 
+A _search query_, or _query_, is a request for information about data in
 {es} data streams or indices.
 {es} data streams or indices.
 
 
 You can think of a query as a question, written in a way {es} understands.
 You can think of a query as a question, written in a way {es} understands.
@@ -24,55 +24,30 @@ a specific number of results.
 [[run-an-es-search]]
 [[run-an-es-search]]
 == Run a search
 == Run a search
 
 
-You can use the <<search-search,search API>> to search data stored in
-{es} data streams or indices.
-
-The API can run two types of searches, depending on how you provide
-queries:
-
-<<run-uri-search,URI searches>>::
-  Queries are provided through a query parameter. URI searches tend to be
-  simpler and best suited for testing.
-
-<<run-request-body-search,Request body searches>>::
-  Queries are provided through the JSON body of the API request. These queries
-  are written in <<query-dsl,Query DSL>>. We recommend using request body
-  searches in most production use cases.
-
-[WARNING]
-====
-If you specify a query in both the URI and request body, the search API request
-runs only the URI query.
-====
-
-[discrete]
-[[run-uri-search]]
-=== Run a URI search
-
-You can use the search API's <<search-api-query-params-q,`q` query string
-parameter>> to run a search in the request's URI. The `q` parameter only accepts
-queries written in Lucene's <<query-string-syntax,query string syntax>>.
+You can use the <<search-search,search API>> to search and
+<<search-aggregations,aggregate>> data stored in {es} data streams or indices.
+The API's `query` request body parameter accepts queries written in
+<<query-dsl,Query DSL>>.
 
 
-The following URI search matches documents with a `user.id` value of `kimchy`.
+The following request searches `my-index-000001` using a
+<<query-dsl-match-query,`match`>> query. This query matches documents with a
+`user.id` value of `kimchy`.
 
 
 [source,console]
 [source,console]
 ----
 ----
-GET /my-index-000001/_search?q=user.id:kimchy
+GET /my-index-000001/_search
+{
+  "query": {
+    "match": {
+      "user.id": "kimchy"
+    }
+  }
+}
 ----
 ----
 // TEST[setup:my_index]
 // TEST[setup:my_index]
 
 
-The API returns the following response.
-
-By default, the `hits.hits` property returns the top 10 documents matching the
-query. To retrieve more documents, see <<paginate-search-results>>.
-
-The response sorts documents in `hits.hits` by `_score`, a
-<<relevance-scores,relevance score>> that measures how well each document
-matches the query.
-
-The `hit.hits` property also includes the <<mapping-source-field,`_source`>> for
-each matching document. To retrieve only a subset of the `_source` or other
-fields, see <<search-fields>>.
+The API response returns the top 10 documents matching the query in the
+`hits.hits` property.
 
 
 [source,console-result]
 [source,console-result]
 ----
 ----
@@ -125,63 +100,84 @@ fields, see <<search-fields>>.
 // TESTRESPONSE[s/"_id": "kxWFcnMByiguvud1Z8vC"/"_id": "$body.hits.hits.0._id"/]
 // TESTRESPONSE[s/"_id": "kxWFcnMByiguvud1Z8vC"/"_id": "$body.hits.hits.0._id"/]
 
 
 [discrete]
 [discrete]
-[[run-request-body-search]]
-=== Run a request body search
-
-You can use the search API's <<request-body-search-query,`query` request
-body parameter>> to provide a query as a JSON object, written in
-<<query-dsl,Query DSL>>.
-
-The following request body search uses the <<query-dsl-match-query,`match`>>
-query to match documents with a `user.id` value of `kimchy`.
-
-[source,console]
-----
-GET /my-index-000001/_search
-{
-  "query": {
-    "match": {
-      "user.id": "kimchy"
-    }
-  }
-}
-----
-// TEST[setup:my_index]
+[[common-search-options]]
+=== Common search options
+
+You can use the following options to customize your searches.
+
+*Query DSL* +
+<<query-dsl,Query DSL>> supports a variety of query types you can mix and match
+to get the results you want. Query types include:
+
+* <<query-dsl-bool-query,Boolean>> and other <<compound-queries,compound
+queries>>, which let you combine queries and match results based on multiple
+criteria
+* <<term-level-queries,Term-level queries>> for filtering and finding exact matches
+* <<full-text-queries,Full text queries>>, which are commonly used in search
+engines
+* <<geo-queries,Geo>> and <<shape-queries,spatial queries>>
+
+*Aggregations* +
+You can use <<search-aggregations,search aggregations>> to get statistics and
+other analytics for your search results. Aggregations help you answer questions
+like:
+
+* What's the average response time for my servers?
+* What are the top IP addresses hit by users on my network?
+* What is the total transaction revenue by customer?
+
+*Search multiple data streams and indices* +
+You can use comma-separated values and grep-like index patterns to search
+several data streams and indices in the same request. You can even boost search
+results from specific indices. See <<search-multiple-indices>>.
+
+*Paginate search results* +
+By default, searches return only the top 10 matching hits. To retrieve
+more or fewer documents, see <<paginate-search-results>>.
+
+*Retrieve selected fields* +
+The search response's `hit.hits` property includes the full document
+<<mapping-source-field,`_source`>> for each hit. To retrieve only a subset of
+the `_source` or other fields, see <<search-fields>>.
+
+*Sort search results* +
+By default, search hits are sorted by `_score`, a <<relevance-scores,relevance
+score>> that measures how well each document matches the query. To customize the
+calculation of these scores, use the
+<<query-dsl-script-score-query,`script_score`>> query. To sort search hits by
+other field values, see <<sort-search-results>>.
+
+*Run an async search* +
+{es} searches are designed to run on large volumes of data quickly, often
+returning results in milliseconds. For this reason, searches are
+_synchronous_ by default. The search request waits for complete results before
+returning a response.
+
+However, complete results can take longer for searches across
+<<frozen-indices,frozen indices>> or <<modules-cross-cluster-search,multiple
+clusters>>.
+
+To avoid long waits, you can use run an _asynchronous_, or _async_, search
+instead. An <<async-search-intro,async search>> lets you retrieve partial
+results for a long-running search now and get complete results later.
 
 
 [discrete]
 [discrete]
-[[search-multiple-indices]]
-=== Search multiple data streams and indices
+[[search-timeout]]
+=== Search timeout
 
 
-To search multiple data streams and indices, add them as comma-separated values
-in the search API request path.
+By default, search requests don't time out. The request waits for complete
+results before returning a response.
 
 
-The following request searches the `my-index-000001` and `my-index-000002`
-indices.
+While <<async-search-intro,async search>> is designed for long-running
+searches, you can also use the `timeout` parameter to specify a duration you'd
+like to wait for a search to complete. If no response is received before this
+period ends, the request fails and returns an error.
 
 
 [source,console]
 [source,console]
 ----
 ----
-GET /my-index-000001,my-index-000002/_search
-{
-  "query": {
-    "match": {
-      "user.id": "kimchy"
-    }
-  }
-}
-----
-// TEST[setup:my_index]
-// TEST[s/^/PUT my-index-000002\n/]
-
-You can also search multiple data streams and indices using a wildcard (`*`)
-pattern.
-
-The following request targets the wildcard pattern `my-index-*`. The request
-searches any data streams or indices in the cluster that start with `my-index-`.
-
-[source,console]
-----
-GET /my-index-*/_search
+GET /my-index-000001/_search
 {
 {
+  "timeout": "2s",
   "query": {
   "query": {
     "match": {
     "match": {
       "user.id": "kimchy"
       "user.id": "kimchy"
@@ -191,45 +187,23 @@ GET /my-index-*/_search
 ----
 ----
 // TEST[setup:my_index]
 // TEST[setup:my_index]
 
 
-To search all data streams and indices in a cluster, omit the target from the
-request path. Alternatively, you can use `_all` or `*`.
-
-The following requests are equivalent and search all data streams and indices in the cluster.
+To set a cluster-wide default timeout for all search requests, configure
+`search.default_search_timeout` using the <<cluster-update-settings,cluster
+settings API>>. This global timeout duration is used if no `timeout` argument is
+passed in the request. If the global search timeout expires before the search
+request finishes, the request is cancelled using <<task-cancellation,task
+cancellation>>. The `search.default_search_timeout` setting defaults to `-1` (no
+timeout).
 
 
-[source,console]
-----
-GET /_search
-{
-  "query": {
-    "match": {
-      "user.id": "kimchy"
-    }
-  }
-}
-
-GET /_all/_search
-{
-  "query": {
-    "match": {
-      "user.id": "kimchy"
-    }
-  }
-}
+[discrete]
+[[global-search-cancellation]]
+=== Search cancellation
 
 
-GET /*/_search
-{
-  "query": {
-    "match": {
-      "user.id": "kimchy"
-    }
-  }
-}
-----
-// TEST[setup:my_index]
+You can cancel a search request using the <<task-cancellation,task management
+API>>. {es} also automatically cancels a search request when your client's HTTP
+connection closes. We recommend you set up your client to close HTTP connections
+when a search request is aborted or times out.
 
 
-include::request/index-boost.asciidoc[]
-include::request/preference.asciidoc[]
-include::request/search-type.asciidoc[]
 include::request/track-total-hits.asciidoc[]
 include::request/track-total-hits.asciidoc[]
 include::quickly-check-for-matching-docs.asciidoc[]
 include::quickly-check-for-matching-docs.asciidoc[]
 
 
@@ -242,4 +216,6 @@ include::paginate-search-results.asciidoc[]
 include::request/inner-hits.asciidoc[]
 include::request/inner-hits.asciidoc[]
 include::search-fields.asciidoc[]
 include::search-fields.asciidoc[]
 include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[]
 include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[]
+include::search-multiple-indices.asciidoc[]
+include::search-shard-routing.asciidoc[]
 include::request/sort.asciidoc[]
 include::request/sort.asciidoc[]

+ 85 - 4
docs/reference/search/search.asciidoc

@@ -129,9 +129,44 @@ When unspecified, the pre-filter phase is executed if any of these conditions is
   - The request targets one or more read-only index.
   - The request targets one or more read-only index.
   - The primary sort of the query targets an indexed field.
   - The primary sort of the query targets an indexed field.
 
 
+[[search-preference]]
 `preference`::
 `preference`::
-(Optional, string) Specifies the node or shard the operation should be
-performed on. Random by default.
+(Optional, string)
+Nodes and shards used for the search. By default, {es} selects from eligible
+nodes and shards using <<search-adaptive-replica,adaptive replica selection>>,
+accounting for <<shard-allocation-awareness,allocation awareness>>.
++
+.Valid values for `preference`
+[%collapsible%open]
+====
+`_only_local`::
+Run the search only on shards on the local node.
+
+`_local`::
+If possible, run the search on shards on the local node. If not, select shards
+using the default method.
+
+`_only_nodes:<node-id>,<node-id>`::
+Run the search on only the specified nodes IDs. If suitable shards exist on more
+than one selected nodes, use shards on those nodes using the default method. If
+none of the specified nodes are available, select shards from any available node
+using the default method.
+
+`_prefer_nodes:<node-id>,<node-id>`::
+If possible, run the search on the specified nodes IDs. If not, select shards
+using the default method.
+
+`_shards:<shard>,<shard>`::
+Run the search only on the specified shards. This value can be combined with
+other `preference` values, but this value must come first. For example:
+`_shards:2,3|_local`
+
+<custom-string>::
+Any string that does not start with `_`. If the cluster state and selected
+shards do not change, searches using the same `<custom-string>` value are routed
+to the same shards in the same order.
+====
+
 
 
 [[search-api-query-params-q]]
 [[search-api-query-params-q]]
 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search-q]
 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search-q]
@@ -164,7 +199,28 @@ Period to retain the <<scroll-search-context,search context>> for scrolling. See
 By default, this value cannot exceed `1d` (24 hours). You can change
 By default, this value cannot exceed `1d` (24 hours). You can change
 this limit using the `search.max_keep_alive` cluster-level setting.
 this limit using the `search.max_keep_alive` cluster-level setting.
 
 
-include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=search_type]
+[[search-type]]
+`search_type`::
+(Optional, string)
+How {wikipedia}/Tf–idf[distributed term frequencies] are calculated for
+<<relevance-scores,relevance scoring>>.
++
+.Valid values for `search_type`
+[%collapsible%open]
+====
+`query_then_fetch`::
+(Default)
+Distributed term frequencies are calculated locally for each shard running the
+search. We recommend this option for faster searches with potentially less
+accurate scoring.
+
+[[dfs-query-then-fetch]]
+`dfs_query_then_fetch`::
+Distributed term frequencies are calculated globally, using information gathered
+from all shards running the search. While this option increases the accuracy of
+scoring, it adds a round-trip to each shard, which can result in slower
+searches.
+====
 
 
 `seq_no_primary_term`::
 `seq_no_primary_term`::
 (Optional, boolean) If `true`, returns sequence number and primary term of the
 (Optional, boolean) If `true`, returns sequence number and primary term of the
@@ -284,7 +340,7 @@ You can specify items in the array as a string or object.
 See <<docvalue-fields>>.
 See <<docvalue-fields>>.
 +
 +
 .Properties of `docvalue_fields` objects
 .Properties of `docvalue_fields` objects
-[%collapsible]
+[%collapsible%open]
 ====
 ====
 `field`::
 `field`::
 (Required, string)
 (Required, string)
@@ -326,6 +382,24 @@ As an alternative to deep paging, we recommend using
 <<search-after,`search_after`>> parameter.
 <<search-after,`search_after`>> parameter.
 --
 --
 
 
+`indices_boost`::
+(Optional, array of objects)
+Boosts the <<relevance-scores,`_score`>> of documents from specified indices.
++
+.Properties of `indices_boost` objects
+[%collapsible%open]
+====
+`<index>: <boost-value>`::
+(Required, float)
+`<index>` is the name of the index or index alias. Wildcard (`*`) expressions
+are supported.
++
+`<boost-value>` is the factor by which scores are multiplied.
++
+A boost value greater than `1.0` increases the score. A boost value between
+`0` and `1.0` decreases the score. 
+====
+
 [[search-api-min-score]]
 [[search-api-min-score]]
 `min_score`::
 `min_score`::
 (Optional, float)
 (Optional, float)
@@ -409,6 +483,13 @@ exclude fields from this subset using the `excludes` property.
 =====
 =====
 ====
 ====
 
 
+[[stats-groups]]
+`stats`::
+(Optional, array of strings)
+Stats groups to associate with the search. Each group maintains a statistics
+aggregation for its associated searches. You can retrieve these stats using the
+<<indices-stats,indices stats API>>.
+
 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=terminate_after]
 include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=terminate_after]
 +
 +
 Defaults to `0`, which does not terminate query execution early.
 Defaults to `0`, which does not terminate query execution early.