stats.asciidoc 3.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
  1. [discrete]
  2. [[esql-stats-by]]
  3. === `STATS ... BY`
  4. **Syntax**
  5. [source,esql]
  6. ----
  7. STATS [column1 =] expression1[, ..., [columnN =] expressionN] [BY grouping_column1[, ..., grouping_columnN]]
  8. ----
  9. *Parameters*
  10. `columnX`::
  11. The name by which the aggregated value is returned. If omitted, the name is
  12. equal to the corresponding expression (`expressionX`).
  13. `expressionX`::
  14. An expression that computes an aggregated value.
  15. `grouping_columnX`::
  16. The column containing the values to group by.
  17. *Description*
  18. The `STATS ... BY` processing command groups rows according to a common value
  19. and calculate one or more aggregated values over the grouped rows. If `BY` is
  20. omitted, the output table contains exactly one row with the aggregations applied
  21. over the entire dataset.
  22. The following aggregation functions are supported:
  23. include::../functions/aggregation-functions.asciidoc[tag=agg_list]
  24. NOTE: `STATS` without any groups is much much faster than adding a group.
  25. NOTE: Grouping on a single column is currently much more optimized than grouping
  26. on many columns. In some tests we have seen grouping on a single `keyword`
  27. column to be five times faster than grouping on two `keyword` columns. Do
  28. not try to work around this by combining the two columns together with
  29. something like <<esql-concat>> and then grouping - that is not going to be
  30. faster.
  31. *Examples*
  32. Calculating a statistic and grouping by the values of another column:
  33. [source.merge.styled,esql]
  34. ----
  35. include::{esql-specs}/stats.csv-spec[tag=stats]
  36. ----
  37. [%header.monospaced.styled,format=dsv,separator=|]
  38. |===
  39. include::{esql-specs}/stats.csv-spec[tag=stats-result]
  40. |===
  41. Omitting `BY` returns one row with the aggregations applied over the entire
  42. dataset:
  43. [source.merge.styled,esql]
  44. ----
  45. include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy]
  46. ----
  47. [%header.monospaced.styled,format=dsv,separator=|]
  48. |===
  49. include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy-result]
  50. |===
  51. It's possible to calculate multiple values:
  52. [source,esql]
  53. ----
  54. include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
  55. ----
  56. It's also possible to group by multiple values (only supported for long and
  57. keyword family fields):
  58. [source,esql]
  59. ----
  60. include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
  61. ----
  62. Specifying the output column name is optional. If not specified, the new column
  63. name is equal to the expression. The following query returns a column named
  64. `AVG(salary)`:
  65. [source.merge.styled,esql]
  66. ----
  67. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn]
  68. ----
  69. [%header.monospaced.styled,format=dsv,separator=|]
  70. |===
  71. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn-result]
  72. |===
  73. Because this name contains special characters, <<esql-identifiers,it needs to be
  74. quoted>> with backticks (+{backtick}+) when using it in subsequent commands:
  75. [source.merge.styled,esql]
  76. ----
  77. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval]
  78. ----
  79. [%header.monospaced.styled,format=dsv,separator=|]
  80. |===
  81. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval-result]
  82. |===