count-distinct.asciidoc 2.8 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
  1. [discrete]
  2. [[esql-agg-count-distinct]]
  3. === `COUNT_DISTINCT`
  4. *Syntax*
  5. [source,esql]
  6. ----
  7. COUNT_DISTINCT(expression[, precision_threshold])
  8. ----
  9. *Parameters*
  10. `expression`::
  11. Expression that outputs the values on which to perform a distinct count.
  12. `precision_threshold`::
  13. Precision threshold. Refer to <<esql-agg-count-distinct-approximate>>. The
  14. maximum supported value is 40000. Thresholds above this number will have the
  15. same effect as a threshold of 40000. The default value is 3000.
  16. *Description*
  17. Returns the approximate number of distinct values.
  18. *Supported types*
  19. Can take any field type as input.
  20. *Examples*
  21. [source.merge.styled,esql]
  22. ----
  23. include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct]
  24. ----
  25. [%header.monospaced.styled,format=dsv,separator=|]
  26. |===
  27. include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-result]
  28. |===
  29. With the optional second parameter to configure the precision threshold:
  30. [source.merge.styled,esql]
  31. ----
  32. include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision]
  33. ----
  34. [%header.monospaced.styled,format=dsv,separator=|]
  35. |===
  36. include::{esql-specs}/stats_count_distinct.csv-spec[tag=count-distinct-precision-result]
  37. |===
  38. The expression can use inline functions. This example splits a string into
  39. multiple values using the `SPLIT` function and counts the unique values:
  40. [source.merge.styled,esql]
  41. ----
  42. include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression]
  43. ----
  44. [%header.monospaced.styled,format=dsv,separator=|]
  45. |===
  46. include::{esql-specs}/stats_count_distinct.csv-spec[tag=docsCountDistinctWithExpression-result]
  47. |===
  48. [discrete]
  49. [[esql-agg-count-distinct-approximate]]
  50. ==== Counts are approximate
  51. Computing exact counts requires loading values into a set and returning its
  52. size. This doesn't scale when working on high-cardinality sets and/or large
  53. values as the required memory usage and the need to communicate those
  54. per-shard sets between nodes would utilize too many resources of the cluster.
  55. This `COUNT_DISTINCT` function is based on the
  56. https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf[HyperLogLog++]
  57. algorithm, which counts based on the hashes of the values with some interesting
  58. properties:
  59. include::../../aggregations/metrics/cardinality-aggregation.asciidoc[tag=explanation]
  60. The `COUNT_DISTINCT` function takes an optional second parameter to configure
  61. the precision threshold. The precision_threshold options allows to trade memory
  62. for accuracy, and defines a unique count below which counts are expected to be
  63. close to accurate. Above this value, counts might become a bit more fuzzy. The
  64. maximum supported value is 40000, thresholds above this number will have the
  65. same effect as a threshold of 40000. The default value is `3000`.