count-distinct.asciidoc 1.5 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546
  1. [discrete]
  2. [[esql-agg-count-distinct]]
  3. === `COUNT_DISTINCT`
  4. The approximate number of distinct values.
  5. [source.merge.styled,esql]
  6. ----
  7. include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct]
  8. ----
  9. [%header.monospaced.styled,format=dsv,separator=|]
  10. |===
  11. include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-result]
  12. |===
  13. Can take any field type as input and the result is always a `long` not matter
  14. the input type.
  15. [discrete]
  16. ==== Counts are approximate
  17. Computing exact counts requires loading values into a set and returning its
  18. size. This doesn't scale when working on high-cardinality sets and/or large
  19. values as the required memory usage and the need to communicate those
  20. per-shard sets between nodes would utilize too many resources of the cluster.
  21. This `COUNT_DISTINCT` function is based on the
  22. https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
  23. algorithm, which counts based on the hashes of the values with some interesting
  24. properties:
  25. include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]
  26. [discrete]
  27. ==== Precision is configurable
  28. The `COUNT_DISTINCT` function takes an optional second parameter to configure the
  29. precision discussed previously.
  30. [source.merge.styled,esql]
  31. ----
  32. include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision]
  33. ----
  34. [%header.monospaced.styled,format=dsv,separator=|]
  35. |===
  36. include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result]
  37. |===