sampler-aggregation.asciidoc 2.3 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576
  1. [[search-aggregations-bucket-sampler-aggregation]]
  2. === Sampler Aggregation
  3. experimental[]
  4. A filtering aggregation used to limit any sub aggregations' processing to a sample of the top-scoring documents.
  5. .Example use cases:
  6. * Tightening the focus of analytics to high-relevance matches rather than the potentially very long tail of low-quality matches
  7. * Reducing the running cost of aggregations that can produce useful results using only samples e.g. `significant_terms`
  8. Example:
  9. [source,js]
  10. --------------------------------------------------
  11. {
  12. "query": {
  13. "match": {
  14. "text": "iphone"
  15. }
  16. },
  17. "aggs": {
  18. "sample": {
  19. "sampler": {
  20. "shard_size": 200
  21. },
  22. "aggs": {
  23. "keywords": {
  24. "significant_terms": {
  25. "field": "text"
  26. }
  27. }
  28. }
  29. }
  30. }
  31. }
  32. --------------------------------------------------
  33. Response:
  34. [source,js]
  35. --------------------------------------------------
  36. {
  37. ...
  38. "aggregations": {
  39. "sample": {
  40. "doc_count": 1000,<1>
  41. "keywords": {<2>
  42. "doc_count": 1000,
  43. "buckets": [
  44. ...
  45. {
  46. "key": "bend",
  47. "doc_count": 58,
  48. "score": 37.982536582524276,
  49. "bg_count": 103
  50. },
  51. ....
  52. }
  53. --------------------------------------------------
  54. <1> 1000 documents were sampled in total because we asked for a maximum of 200 from an index with 5 shards. The cost of performing the nested significant_terms aggregation was therefore limited rather than unbounded.
  55. ==== shard_size
  56. The `shard_size` parameter limits how many top-scoring documents are collected in the sample processed on each shard.
  57. The default value is 100.
  58. ==== Limitations
  59. ===== Cannot be nested under `breadth_first` aggregations
  60. Being a quality-based filter the sampler aggregation needs access to the relevance score produced for each document.
  61. It therefore cannot be nested under a `terms` aggregation which has the `collect_mode` switched from the default `depth_first` mode to `breadth_first` as this discards scores.
  62. In this situation an error will be thrown.