|
@@ -0,0 +1,458 @@
|
|
|
+[[search-your-data]]
|
|
|
+= Search your data
|
|
|
+
|
|
|
+[[search-query]]
|
|
|
+A _search query_, or _query_, is a request for information about data in
|
|
|
+{es} data streams or indices.
|
|
|
+
|
|
|
+You can think of a query as a question, written in a way {es} understands.
|
|
|
+Depending on your data, you can use a query to get answers to questions like:
|
|
|
+
|
|
|
+* What processes on my server take longer than 500 milliseconds to respond?
|
|
|
+* What users on my network ran `regsvr32.exe` within the last week?
|
|
|
+* What pages on my website contain a specific word or phrase?
|
|
|
+
|
|
|
+A _search_ consists of one or more queries that are combined and sent to {es}.
|
|
|
+Documents that match a search's queries are returned in the _hits_, or
|
|
|
+_search results_, of the response.
|
|
|
+
|
|
|
+A search may also contain additional information used to better process its
|
|
|
+queries. For example, a search may be limited to a specific index or only return
|
|
|
+a specific number of results.
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[run-an-es-search]]
|
|
|
+== Run a search
|
|
|
+
|
|
|
+You can use the <<search-search,search API>> to search and
|
|
|
+<<search-aggregations,aggregate>> data stored in {es} data streams or indices.
|
|
|
+The API's `query` request body parameter accepts queries written in
|
|
|
+<<query-dsl,Query DSL>>.
|
|
|
+
|
|
|
+The following request searches `my-index-000001` using a
|
|
|
+<<query-dsl-match-query,`match`>> query. This query matches documents with a
|
|
|
+`user.id` value of `kimchy`.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----
|
|
|
+GET /my-index-000001/_search
|
|
|
+{
|
|
|
+ "query": {
|
|
|
+ "match": {
|
|
|
+ "user.id": "kimchy"
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+----
|
|
|
+// TEST[setup:my_index]
|
|
|
+
|
|
|
+The API response returns the top 10 documents matching the query in the
|
|
|
+`hits.hits` property.
|
|
|
+
|
|
|
+[source,console-result]
|
|
|
+----
|
|
|
+{
|
|
|
+ "took": 5,
|
|
|
+ "timed_out": false,
|
|
|
+ "_shards": {
|
|
|
+ "total": 1,
|
|
|
+ "successful": 1,
|
|
|
+ "skipped": 0,
|
|
|
+ "failed": 0
|
|
|
+ },
|
|
|
+ "hits": {
|
|
|
+ "total": {
|
|
|
+ "value": 1,
|
|
|
+ "relation": "eq"
|
|
|
+ },
|
|
|
+ "max_score": 1.3862942,
|
|
|
+ "hits": [
|
|
|
+ {
|
|
|
+ "_index": "my-index-000001",
|
|
|
+ "_id": "kxWFcnMByiguvud1Z8vC",
|
|
|
+ "_score": 1.3862942,
|
|
|
+ "_source": {
|
|
|
+ "@timestamp": "2099-11-15T14:12:12",
|
|
|
+ "http": {
|
|
|
+ "request": {
|
|
|
+ "method": "get"
|
|
|
+ },
|
|
|
+ "response": {
|
|
|
+ "bytes": 1070000,
|
|
|
+ "status_code": 200
|
|
|
+ },
|
|
|
+ "version": "1.1"
|
|
|
+ },
|
|
|
+ "message": "GET /search HTTP/1.1 200 1070000",
|
|
|
+ "source": {
|
|
|
+ "ip": "127.0.0.1"
|
|
|
+ },
|
|
|
+ "user": {
|
|
|
+ "id": "kimchy"
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+ ]
|
|
|
+ }
|
|
|
+}
|
|
|
+----
|
|
|
+// TESTRESPONSE[s/"took": 5/"took": "$body.took"/]
|
|
|
+// TESTRESPONSE[s/"_id": "kxWFcnMByiguvud1Z8vC"/"_id": "$body.hits.hits.0._id"/]
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[common-search-options]]
|
|
|
+=== Common search options
|
|
|
+
|
|
|
+You can use the following options to customize your searches.
|
|
|
+
|
|
|
+*Query DSL* +
|
|
|
+<<query-dsl,Query DSL>> supports a variety of query types you can mix and match
|
|
|
+to get the results you want. Query types include:
|
|
|
+
|
|
|
+* <<query-dsl-bool-query,Boolean>> and other <<compound-queries,compound
|
|
|
+queries>>, which let you combine queries and match results based on multiple
|
|
|
+criteria
|
|
|
+* <<term-level-queries,Term-level queries>> for filtering and finding exact matches
|
|
|
+* <<full-text-queries,Full text queries>>, which are commonly used in search
|
|
|
+engines
|
|
|
+* <<geo-queries,Geo>> and <<shape-queries,spatial queries>>
|
|
|
+
|
|
|
+*Aggregations* +
|
|
|
+You can use <<search-aggregations,search aggregations>> to get statistics and
|
|
|
+other analytics for your search results. Aggregations help you answer questions
|
|
|
+like:
|
|
|
+
|
|
|
+* What's the average response time for my servers?
|
|
|
+* What are the top IP addresses hit by users on my network?
|
|
|
+* What is the total transaction revenue by customer?
|
|
|
+
|
|
|
+*Search multiple data streams and indices* +
|
|
|
+You can use comma-separated values and grep-like index patterns to search
|
|
|
+several data streams and indices in the same request. You can even boost search
|
|
|
+results from specific indices. See <<search-multiple-indices>>.
|
|
|
+
|
|
|
+*Paginate search results* +
|
|
|
+By default, searches return only the top 10 matching hits. To retrieve
|
|
|
+more or fewer documents, see <<paginate-search-results>>.
|
|
|
+
|
|
|
+*Retrieve selected fields* +
|
|
|
+The search response's `hit.hits` property includes the full document
|
|
|
+<<mapping-source-field,`_source`>> for each hit. To retrieve only a subset of
|
|
|
+the `_source` or other fields, see <<search-fields>>.
|
|
|
+
|
|
|
+*Sort search results* +
|
|
|
+By default, search hits are sorted by `_score`, a <<relevance-scores,relevance
|
|
|
+score>> that measures how well each document matches the query. To customize the
|
|
|
+calculation of these scores, use the
|
|
|
+<<query-dsl-script-score-query,`script_score`>> query. To sort search hits by
|
|
|
+other field values, see <<sort-search-results>>.
|
|
|
+
|
|
|
+*Run an async search* +
|
|
|
+{es} searches are designed to run on large volumes of data quickly, often
|
|
|
+returning results in milliseconds. For this reason, searches are
|
|
|
+_synchronous_ by default. The search request waits for complete results before
|
|
|
+returning a response.
|
|
|
+
|
|
|
+However, complete results can take longer for searches across
|
|
|
+<<frozen-indices,frozen indices>> or <<modules-cross-cluster-search,multiple
|
|
|
+clusters>>.
|
|
|
+
|
|
|
+To avoid long waits, you can use run an _asynchronous_, or _async_, search
|
|
|
+instead. An <<async-search-intro,async search>> lets you retrieve partial
|
|
|
+results for a long-running search now and get complete results later.
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[search-timeout]]
|
|
|
+=== Search timeout
|
|
|
+
|
|
|
+By default, search requests don't time out. The request waits for complete
|
|
|
+results before returning a response.
|
|
|
+
|
|
|
+While <<async-search-intro,async search>> is designed for long-running
|
|
|
+searches, you can also use the `timeout` parameter to specify a duration you'd
|
|
|
+like to wait for a search to complete. If no response is received before this
|
|
|
+period ends, the request fails and returns an error.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----
|
|
|
+GET /my-index-000001/_search
|
|
|
+{
|
|
|
+ "timeout": "2s",
|
|
|
+ "query": {
|
|
|
+ "match": {
|
|
|
+ "user.id": "kimchy"
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+----
|
|
|
+// TEST[setup:my_index]
|
|
|
+
|
|
|
+To set a cluster-wide default timeout for all search requests, configure
|
|
|
+`search.default_search_timeout` using the <<cluster-update-settings,cluster
|
|
|
+settings API>>. This global timeout duration is used if no `timeout` argument is
|
|
|
+passed in the request. If the global search timeout expires before the search
|
|
|
+request finishes, the request is cancelled using <<task-cancellation,task
|
|
|
+cancellation>>. The `search.default_search_timeout` setting defaults to `-1` (no
|
|
|
+timeout).
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[global-search-cancellation]]
|
|
|
+=== Search cancellation
|
|
|
+
|
|
|
+You can cancel a search request using the <<task-cancellation,task management
|
|
|
+API>>. {es} also automatically cancels a search request when your client's HTTP
|
|
|
+connection closes. We recommend you set up your client to close HTTP connections
|
|
|
+when a search request is aborted or times out.
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[track-total-hits]]
|
|
|
+=== Track total hits
|
|
|
+
|
|
|
+Generally the total hit count can't be computed accurately without visiting all
|
|
|
+matches, which is costly for queries that match lots of documents. The
|
|
|
+`track_total_hits` parameter allows you to control how the total number of hits
|
|
|
+should be tracked.
|
|
|
+Given that it is often enough to have a lower bound of the number of hits,
|
|
|
+such as "there are at least 10000 hits", the default is set to `10,000`.
|
|
|
+This means that requests will count the total hit accurately up to `10,000` hits.
|
|
|
+It's is a good trade off to speed up searches if you don't need the accurate number
|
|
|
+of hits after a certain threshold.
|
|
|
+
|
|
|
+When set to `true` the search response will always track the number of hits that
|
|
|
+match the query accurately (e.g. `total.relation` will always be equal to `"eq"`
|
|
|
+when `track_total_hits` is set to true). Otherwise the `"total.relation"` returned
|
|
|
+in the `"total"` object in the search response determines how the `"total.value"`
|
|
|
+should be interpreted. A value of `"gte"` means that the `"total.value"` is a
|
|
|
+lower bound of the total hits that match the query and a value of `"eq"` indicates
|
|
|
+that `"total.value"` is the accurate count.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+--------------------------------------------------
|
|
|
+GET my-index-000001/_search
|
|
|
+{
|
|
|
+ "track_total_hits": true,
|
|
|
+ "query": {
|
|
|
+ "match" : {
|
|
|
+ "user.id" : "elkbee"
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// TEST[setup:my_index]
|
|
|
+
|
|
|
+\... returns:
|
|
|
+
|
|
|
+[source,console-result]
|
|
|
+--------------------------------------------------
|
|
|
+{
|
|
|
+ "_shards": ...
|
|
|
+ "timed_out": false,
|
|
|
+ "took": 100,
|
|
|
+ "hits": {
|
|
|
+ "max_score": 1.0,
|
|
|
+ "total" : {
|
|
|
+ "value": 2048, <1>
|
|
|
+ "relation": "eq" <2>
|
|
|
+ },
|
|
|
+ "hits": ...
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
|
|
|
+// TESTRESPONSE[s/"took": 100/"took": $body.took/]
|
|
|
+// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
|
|
|
+// TESTRESPONSE[s/"value": 2048/"value": $body.hits.total.value/]
|
|
|
+// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
|
|
|
+
|
|
|
+<1> The total number of hits that match the query.
|
|
|
+<2> The count is accurate (e.g. `"eq"` means equals).
|
|
|
+
|
|
|
+It is also possible to set `track_total_hits` to an integer.
|
|
|
+For instance the following query will accurately track the total hit count that match
|
|
|
+the query up to 100 documents:
|
|
|
+
|
|
|
+[source,console]
|
|
|
+--------------------------------------------------
|
|
|
+GET my-index-000001/_search
|
|
|
+{
|
|
|
+ "track_total_hits": 100,
|
|
|
+ "query": {
|
|
|
+ "match": {
|
|
|
+ "user.id": "elkbee"
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// TEST[continued]
|
|
|
+
|
|
|
+The `hits.total.relation` in the response will indicate if the
|
|
|
+value returned in `hits.total.value` is accurate (`"eq"`) or a lower
|
|
|
+bound of the total (`"gte"`).
|
|
|
+
|
|
|
+For instance the following response:
|
|
|
+
|
|
|
+[source,console-result]
|
|
|
+--------------------------------------------------
|
|
|
+{
|
|
|
+ "_shards": ...
|
|
|
+ "timed_out": false,
|
|
|
+ "took": 30,
|
|
|
+ "hits": {
|
|
|
+ "max_score": 1.0,
|
|
|
+ "total": {
|
|
|
+ "value": 42, <1>
|
|
|
+ "relation": "eq" <2>
|
|
|
+ },
|
|
|
+ "hits": ...
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
|
|
|
+// TESTRESPONSE[s/"took": 30/"took": $body.took/]
|
|
|
+// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
|
|
|
+// TESTRESPONSE[s/"value": 42/"value": $body.hits.total.value/]
|
|
|
+// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
|
|
|
+
|
|
|
+<1> 42 documents match the query
|
|
|
+<2> and the count is accurate (`"eq"`)
|
|
|
+
|
|
|
+\... indicates that the number of hits returned in the `total`
|
|
|
+is accurate.
|
|
|
+
|
|
|
+If the total number of hits that match the query is greater than the
|
|
|
+value set in `track_total_hits`, the total hits in the response
|
|
|
+will indicate that the returned value is a lower bound:
|
|
|
+
|
|
|
+[source,console-result]
|
|
|
+--------------------------------------------------
|
|
|
+{
|
|
|
+ "_shards": ...
|
|
|
+ "hits": {
|
|
|
+ "max_score": 1.0,
|
|
|
+ "total": {
|
|
|
+ "value": 100, <1>
|
|
|
+ "relation": "gte" <2>
|
|
|
+ },
|
|
|
+ "hits": ...
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// TESTRESPONSE[skip:response is already tested in the previous snippet]
|
|
|
+
|
|
|
+<1> There are at least 100 documents that match the query
|
|
|
+<2> This is a lower bound (`"gte"`).
|
|
|
+
|
|
|
+If you don't need to track the total number of hits at all you can improve query
|
|
|
+times by setting this option to `false`:
|
|
|
+
|
|
|
+[source,console]
|
|
|
+--------------------------------------------------
|
|
|
+GET my-index-000001/_search
|
|
|
+{
|
|
|
+ "track_total_hits": false,
|
|
|
+ "query": {
|
|
|
+ "match": {
|
|
|
+ "user.id": "elkbee"
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// TEST[continued]
|
|
|
+
|
|
|
+\... returns:
|
|
|
+
|
|
|
+[source,console-result]
|
|
|
+--------------------------------------------------
|
|
|
+{
|
|
|
+ "_shards": ...
|
|
|
+ "timed_out": false,
|
|
|
+ "took": 10,
|
|
|
+ "hits": { <1>
|
|
|
+ "max_score": 1.0,
|
|
|
+ "hits": ...
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/]
|
|
|
+// TESTRESPONSE[s/"took": 10/"took": $body.took/]
|
|
|
+// TESTRESPONSE[s/"max_score": 1\.0/"max_score": $body.hits.max_score/]
|
|
|
+// TESTRESPONSE[s/"hits": \.\.\./"hits": "$body.hits.hits"/]
|
|
|
+
|
|
|
+<1> The total number of hits is unknown.
|
|
|
+
|
|
|
+Finally you can force an accurate count by setting `"track_total_hits"`
|
|
|
+to `true` in the request.
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[quickly-check-for-matching-docs]]
|
|
|
+=== Quickly check for matching docs
|
|
|
+
|
|
|
+If you only want to know if there are any documents matching a
|
|
|
+specific query, you can set the `size` to `0` to indicate that we are not
|
|
|
+interested in the search results. You can also set `terminate_after` to `1`
|
|
|
+to indicate that the query execution can be terminated whenever the first
|
|
|
+matching document was found (per shard).
|
|
|
+
|
|
|
+[source,console]
|
|
|
+--------------------------------------------------
|
|
|
+GET /_search?q=user.id:elkbee&size=0&terminate_after=1
|
|
|
+--------------------------------------------------
|
|
|
+// TEST[setup:my_index]
|
|
|
+
|
|
|
+NOTE: `terminate_after` is always applied **after** the
|
|
|
+<<post-filter,`post_filter`>> and stops the query as well as the aggregation
|
|
|
+executions when enough hits have been collected on the shard. Though the doc
|
|
|
+count on aggregations may not reflect the `hits.total` in the response since
|
|
|
+aggregations are applied **before** the post filtering.
|
|
|
+
|
|
|
+The response will not contain any hits as the `size` was set to `0`. The
|
|
|
+`hits.total` will be either equal to `0`, indicating that there were no
|
|
|
+matching documents, or greater than `0` meaning that there were at least
|
|
|
+as many documents matching the query when it was early terminated.
|
|
|
+Also if the query was terminated early, the `terminated_early` flag will
|
|
|
+be set to `true` in the response.
|
|
|
+
|
|
|
+[source,console-result]
|
|
|
+--------------------------------------------------
|
|
|
+{
|
|
|
+ "took": 3,
|
|
|
+ "timed_out": false,
|
|
|
+ "terminated_early": true,
|
|
|
+ "_shards": {
|
|
|
+ "total": 1,
|
|
|
+ "successful": 1,
|
|
|
+ "skipped" : 0,
|
|
|
+ "failed": 0
|
|
|
+ },
|
|
|
+ "hits": {
|
|
|
+ "total" : {
|
|
|
+ "value": 1,
|
|
|
+ "relation": "eq"
|
|
|
+ },
|
|
|
+ "max_score": null,
|
|
|
+ "hits": []
|
|
|
+ }
|
|
|
+}
|
|
|
+--------------------------------------------------
|
|
|
+// TESTRESPONSE[s/"took": 3/"took": $body.took/]
|
|
|
+
|
|
|
+
|
|
|
+The `took` time in the response contains the milliseconds that this request
|
|
|
+took for processing, beginning quickly after the node received the query, up
|
|
|
+until all search related work is done and before the above JSON is returned
|
|
|
+to the client. This means it includes the time spent waiting in thread pools,
|
|
|
+executing a distributed search across the whole cluster and gathering all the
|
|
|
+results.
|
|
|
+
|
|
|
+include::collapse-search-results.asciidoc[]
|
|
|
+include::filter-search-results.asciidoc[]
|
|
|
+include::highlighting.asciidoc[]
|
|
|
+include::long-running-searches.asciidoc[]
|
|
|
+include::near-real-time.asciidoc[]
|
|
|
+include::paginate-search-results.asciidoc[]
|
|
|
+include::retrieve-inner-hits.asciidoc[]
|
|
|
+include::retrieve-selected-fields.asciidoc[]
|
|
|
+include::search-across-clusters.asciidoc[]
|
|
|
+include::search-multiple-indices.asciidoc[]
|
|
|
+include::search-shard-routing.asciidoc[]
|
|
|
+include::sort-search-results.asciidoc[]
|