123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404 |
- [[search-aggregations-metrics-top-hits-aggregation]]
- === Top Hits Aggregation
- A `top_hits` metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended
- to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.
- The `top_hits` aggregator can effectively be used to group result sets by certain fields via a bucket aggregator.
- One or more bucket aggregators determines by which properties a result set get sliced into.
- ==== Options
- * `from` - The offset from the first result you want to fetch.
- * `size` - The maximum number of top matching hits to return per bucket. By default the top three matching hits are returned.
- * `sort` - How the top matching hits should be sorted. By default the hits are sorted by the score of the main query.
- ==== Supported per hit features
- The top_hits aggregation returns regular search hits, because of this many per hit features can be supported:
- * <<request-body-search-highlighting,Highlighting>>
- * <<request-body-search-explain,Explain>>
- * <<request-body-search-queries-and-filters,Named filters and queries>>
- * <<source-filtering,Source filtering>>
- * <<request-body-search-stored-fields,Stored fields>>
- * <<request-body-search-script-fields,Script fields>>
- * <<docvalue-fields,Doc value fields>>
- * <<request-body-search-version,Include versions>>
- * <<request-body-search-seq-no-primary-term,Include Sequence Numbers and Primary Terms>>
- IMPORTANT: If you *only* need `docvalue_fields`, `size`, and `sort` then
- <<search-aggregations-metrics-top-metrics>> might be a more efficient choice than the Top Hits Aggregation.
- ==== Example
- In the following example we group the sales by type and per type we show the last sale.
- For each sale only the date and price fields are being included in the source.
- [source,console]
- --------------------------------------------------
- POST /sales/_search?size=0
- {
- "aggs": {
- "top_tags": {
- "terms": {
- "field": "type",
- "size": 3
- },
- "aggs": {
- "top_sales_hits": {
- "top_hits": {
- "sort": [
- {
- "date": {
- "order": "desc"
- }
- }
- ],
- "_source": {
- "includes": [ "date", "price" ]
- },
- "size" : 1
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:sales]
- Possible response:
- [source,console-result]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "top_tags": {
- "doc_count_error_upper_bound": 0,
- "sum_other_doc_count": 0,
- "buckets": [
- {
- "key": "hat",
- "doc_count": 3,
- "top_sales_hits": {
- "hits": {
- "total" : {
- "value": 3,
- "relation": "eq"
- },
- "max_score": null,
- "hits": [
- {
- "_index": "sales",
- "_id": "AVnNBmauCQpcRyxw6ChK",
- "_source": {
- "date": "2015/03/01 00:00:00",
- "price": 200
- },
- "sort": [
- 1425168000000
- ],
- "_score": null
- }
- ]
- }
- }
- },
- {
- "key": "t-shirt",
- "doc_count": 3,
- "top_sales_hits": {
- "hits": {
- "total" : {
- "value": 3,
- "relation": "eq"
- },
- "max_score": null,
- "hits": [
- {
- "_index": "sales",
- "_id": "AVnNBmauCQpcRyxw6ChL",
- "_source": {
- "date": "2015/03/01 00:00:00",
- "price": 175
- },
- "sort": [
- 1425168000000
- ],
- "_score": null
- }
- ]
- }
- }
- },
- {
- "key": "bag",
- "doc_count": 1,
- "top_sales_hits": {
- "hits": {
- "total" : {
- "value": 1,
- "relation": "eq"
- },
- "max_score": null,
- "hits": [
- {
- "_index": "sales",
- "_id": "AVnNBmatCQpcRyxw6ChH",
- "_source": {
- "date": "2015/01/01 00:00:00",
- "price": 150
- },
- "sort": [
- 1420070400000
- ],
- "_score": null
- }
- ]
- }
- }
- }
- ]
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,/]
- // TESTRESPONSE[s/AVnNBmauCQpcRyxw6ChK/$body.aggregations.top_tags.buckets.0.top_sales_hits.hits.hits.0._id/]
- // TESTRESPONSE[s/AVnNBmauCQpcRyxw6ChL/$body.aggregations.top_tags.buckets.1.top_sales_hits.hits.hits.0._id/]
- // TESTRESPONSE[s/AVnNBmatCQpcRyxw6ChH/$body.aggregations.top_tags.buckets.2.top_sales_hits.hits.hits.0._id/]
- ==== Field collapse example
- Field collapsing or result grouping is a feature that logically groups a result set into groups and per group returns
- top documents. The ordering of the groups is determined by the relevancy of the first document in a group. In
- Elasticsearch this can be implemented via a bucket aggregator that wraps a `top_hits` aggregator as sub-aggregator.
- In the example below we search across crawled webpages. For each webpage we store the body and the domain the webpage
- belong to. By defining a `terms` aggregator on the `domain` field we group the result set of webpages by domain. The
- `top_hits` aggregator is then defined as sub-aggregator, so that the top matching hits are collected per bucket.
- Also a `max` aggregator is defined which is used by the `terms` aggregator's order feature to return the buckets by
- relevancy order of the most relevant document in a bucket.
- [source,console]
- --------------------------------------------------
- POST /sales/_search
- {
- "query": {
- "match": {
- "body": "elections"
- }
- },
- "aggs": {
- "top_sites": {
- "terms": {
- "field": "domain",
- "order": {
- "top_hit": "desc"
- }
- },
- "aggs": {
- "top_tags_hits": {
- "top_hits": {}
- },
- "top_hit" : {
- "max": {
- "script": {
- "source": "_score"
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:sales]
- At the moment the `max` (or `min`) aggregator is needed to make sure the buckets from the `terms` aggregator are
- ordered according to the score of the most relevant webpage per domain. Unfortunately the `top_hits` aggregator
- can't be used in the `order` option of the `terms` aggregator yet.
- ==== top_hits support in a nested or reverse_nested aggregator
- If the `top_hits` aggregator is wrapped in a `nested` or `reverse_nested` aggregator then nested hits are being returned.
- Nested hits are in a sense hidden mini documents that are part of regular document where in the mapping a nested field type
- has been configured. The `top_hits` aggregator has the ability to un-hide these documents if it is wrapped in a `nested`
- or `reverse_nested` aggregator. Read more about nested in the <<nested,nested type mapping>>.
- If nested type has been configured a single document is actually indexed as multiple Lucene documents and they share
- the same id. In order to determine the identity of a nested hit there is more needed than just the id, so that is why
- nested hits also include their nested identity. The nested identity is kept under the `_nested` field in the search hit
- and includes the array field and the offset in the array field the nested hit belongs to. The offset is zero based.
- Let's see how it works with a real sample. Considering the following mapping:
- [source,console]
- --------------------------------------------------
- PUT /sales
- {
- "mappings": {
- "properties" : {
- "tags" : { "type" : "keyword" },
- "comments" : { <1>
- "type" : "nested",
- "properties" : {
- "username" : { "type" : "keyword" },
- "comment" : { "type" : "text" }
- }
- }
- }
- }
- }
- --------------------------------------------------
- <1> The `comments` is an array that holds nested documents under the `product` object.
- And some documents:
- [source,console]
- --------------------------------------------------
- PUT /sales/_doc/1?refresh
- {
- "tags": ["car", "auto"],
- "comments": [
- {"username": "baddriver007", "comment": "This car could have better brakes"},
- {"username": "dr_who", "comment": "Where's the autopilot? Can't find it"},
- {"username": "ilovemotorbikes", "comment": "This car has two extra wheels"}
- ]
- }
- --------------------------------------------------
- // TEST[continued]
- It's now possible to execute the following `top_hits` aggregation (wrapped in a `nested` aggregation):
- [source,console]
- --------------------------------------------------
- POST /sales/_search
- {
- "query": {
- "term": { "tags": "car" }
- },
- "aggs": {
- "by_sale": {
- "nested" : {
- "path" : "comments"
- },
- "aggs": {
- "by_user": {
- "terms": {
- "field": "comments.username",
- "size": 1
- },
- "aggs": {
- "by_nested": {
- "top_hits":{}
- }
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- // TEST[continued]
- // TEST[s/_search/_search\?filter_path=aggregations.by_sale.by_user.buckets/]
- Top hits response snippet with a nested hit, which resides in the first slot of array field `comments`:
- [source,console-result]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "by_sale": {
- "by_user": {
- "buckets": [
- {
- "key": "baddriver007",
- "doc_count": 1,
- "by_nested": {
- "hits": {
- "total" : {
- "value": 1,
- "relation": "eq"
- },
- "max_score": 0.3616575,
- "hits": [
- {
- "_index": "sales",
- "_id": "1",
- "_nested": {
- "field": "comments", <1>
- "offset": 0 <2>
- },
- "_score": 0.3616575,
- "_source": {
- "comment": "This car could have better brakes", <3>
- "username": "baddriver007"
- }
- }
- ]
- }
- }
- }
- ...
- ]
- }
- }
- }
- }
- --------------------------------------------------
- // TESTRESPONSE[s/\.\.\.//]
- <1> Name of the array field containing the nested hit
- <2> Position if the nested hit in the containing array
- <3> Source of the nested hit
- If `_source` is requested then just the part of the source of the nested object is returned, not the entire source of the document.
- Also stored fields on the *nested* inner object level are accessible via `top_hits` aggregator residing in a `nested` or `reverse_nested` aggregator.
- Only nested hits will have a `_nested` field in the hit, non nested (regular) hits will not have a `_nested` field.
- The information in `_nested` can also be used to parse the original source somewhere else if `_source` isn't enabled.
- If there are multiple levels of nested object types defined in mappings then the `_nested` information can also be hierarchical
- in order to express the identity of nested hits that are two layers deep or more.
- In the example below a nested hit resides in the first slot of the field `nested_grand_child_field` which then resides in
- the second slow of the `nested_child_field` field:
- [source,js]
- --------------------------------------------------
- ...
- "hits": {
- "total" : {
- "value": 2565,
- "relation": "eq"
- },
- "max_score": 1,
- "hits": [
- {
- "_index": "a",
- "_id": "1",
- "_score": 1,
- "_nested" : {
- "field" : "nested_child_field",
- "offset" : 1,
- "_nested" : {
- "field" : "nested_grand_child_field",
- "offset" : 0
- }
- }
- "_source": ...
- },
- ...
- ]
- }
- ...
- --------------------------------------------------
- // NOTCONSOLE
|