% This is generated by ESQL's AbstractFunctionTestCase. Do not edit it. See ../README.md for how to regenerate it.
Computing exact counts requires loading values into a set and returning its size. This doesn’t scale when working on high-cardinality sets and/or large values as the required memory usage and the need to communicate those per-shard sets between nodes would utilize too many resources of the cluster.
This COUNT_DISTINCT
function is based on the
HyperLogLog++
algorithm, which counts based on the hashes of the values with some interesting
properties:
:::{include} /reference/aggregations/_snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md :::
The COUNT_DISTINCT
function takes an optional second parameter to configure
the precision threshold. The precision_threshold
options allows to trade memory
for accuracy, and defines a unique count below which counts are expected to be
close to accurate. Above this value, counts might become a bit more fuzzy. The
maximum supported value is 40000
, thresholds above this number will have the
same effect as a threshold of 40000
. The default value is 3000
.