12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576 |
- [[search-aggregations-bucket-sampler-aggregation]]
- === Sampler Aggregation
- experimental[]
- A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.
- .Example use cases:
- * Tightening the focus of analytics to high-relevance matches rather than the potentially very long tail of low-quality matches
- * Reducing the running cost of aggregations that can produce useful results using only samples e.g. `significant_terms`
-
- Example:
- [source,js]
- --------------------------------------------------
- {
- "query": {
- "match": {
- "text": "iphone"
- }
- },
- "aggs": {
- "sample": {
- "sampler": {
- "shard_size": 200
- },
- "aggs": {
- "keywords": {
- "significant_terms": {
- "field": "text"
- }
- }
- }
- }
- }
- }
- --------------------------------------------------
- Response:
- [source,js]
- --------------------------------------------------
- {
- ...
- "aggregations": {
- "sample": {
- "doc_count": 1000,<1>
- "keywords": {<2>
- "doc_count": 1000,
- "buckets": [
- ...
- {
- "key": "bend",
- "doc_count": 58,
- "score": 37.982536582524276,
- "bg_count": 103
- },
- ....
- }
- --------------------------------------------------
- <1> 1000 documents were sampled in total because we asked for a maximum of 200 from an index with 5 shards. The cost of performing the nested significant_terms aggregation was therefore limited rather than unbounded.
- ==== shard_size
- The `shard_size` parameter limits how many top-scoring documents are collected in the sample processed on each shard.
- The default value is 100.
- ==== Limitations
- ===== Cannot be nested under `breadth_first` aggregations
- Being a quality-based filter the sampler aggregation needs access to the relevance score produced for each document.
- It therefore cannot be nested under a `terms` aggregation which has the `collect_mode` switched from the default `depth_first` mode to `breadth_first` as this discards scores.
- In this situation an error will be thrown.
|