count_distinct.asciidoc 1.3 KB

12345678910111213141516171819202122232425
  1. // This is generated by ESQL's AbstractFunctionTestCase. Do no edit it. See ../README.md for how to regenerate it.
  2. [discrete]
  3. [[esql-agg-count-distinct-approximate]]
  4. ==== Counts are approximate
  5. Computing exact counts requires loading values into a set and returning its
  6. size. This doesn't scale when working on high-cardinality sets and/or large
  7. values as the required memory usage and the need to communicate those
  8. per-shard sets between nodes would utilize too many resources of the cluster.
  9. This `COUNT_DISTINCT` function is based on the
  10. https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
  11. algorithm, which counts based on the hashes of the values with some interesting
  12. properties:
  13. include::../../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]
  14. The `COUNT_DISTINCT` function takes an optional second parameter to configure
  15. the precision threshold. The precision_threshold options allows to trade memory
  16. for accuracy, and defines a unique count below which counts are expected to be
  17. close to accurate. Above this value, counts might become a bit more fuzzy. The
  18. maximum supported value is 40000, thresholds above this number will have the
  19. same effect as a threshold of 40000. The default value is `3000`.