12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273 |
- [discrete]
- [[esql-agg-count-distinct]]
- === `COUNT_DISTINCT`
- *Syntax*
- [source,esql]
- ----
- COUNT_DISTINCT(column[, precision_threshold])
- ----
- *Parameters*
- `column`::
- Column for which to count the number of distinct values.
- `precision_threshold`::
- Precision threshold. Refer to <<esql-agg-count-distinct-approximate>>. The
- maximum supported value is 40000. Thresholds above this number will have the
- same effect as a threshold of 40000. The default value is 3000.
- *Description*
- Returns the approximate number of distinct values.
- [discrete]
- [[esql-agg-count-distinct-approximate]]
- ==== Counts are approximate
- Computing exact counts requires loading values into a set and returning its
- size. This doesn't scale when working on high-cardinality sets and/or large
- values as the required memory usage and the need to communicate those
- per-shard sets between nodes would utilize too many resources of the cluster.
- This `COUNT_DISTINCT` function is based on the
- https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
- algorithm, which counts based on the hashes of the values with some interesting
- properties:
- include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]
- The `COUNT_DISTINCT` function takes an optional second parameter to configure
- the precision threshold. The precision_threshold options allows to trade memory
- for accuracy, and defines a unique count below which counts are expected to be
- close to accurate. Above this value, counts might become a bit more fuzzy. The
- maximum supported value is 40000, thresholds above this number will have the
- same effect as a threshold of 40000. The default value is `3000`.
- *Supported types*
- Can take any field type as input.
- *Examples*
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-result]
- |===
- With the optional second parameter to configure the precision threshold:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result]
- |===
|