stats.asciidoc 5.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173
  1. [discrete]
  2. [[esql-stats-by]]
  3. === `STATS ... BY`
  4. The `STATS ... BY` processing command groups rows according to a common value
  5. and calculates one or more aggregated values over the grouped rows.
  6. **Syntax**
  7. [source,esql]
  8. ----
  9. STATS [column1 =] expression1[, ..., [columnN =] expressionN]
  10. [BY grouping_expression1[, ..., grouping_expressionN]]
  11. ----
  12. *Parameters*
  13. `columnX`::
  14. The name by which the aggregated value is returned. If omitted, the name is
  15. equal to the corresponding expression (`expressionX`).
  16. If multiple columns have the same name, all but the rightmost column with this
  17. name will be ignored.
  18. `expressionX`::
  19. An expression that computes an aggregated value.
  20. `grouping_expressionX`::
  21. An expression that outputs the values to group by.
  22. If its name coincides with one of the computed columns, that column will be ignored.
  23. NOTE: Individual `null` values are skipped when computing aggregations.
  24. *Description*
  25. The `STATS ... BY` processing command groups rows according to a common value
  26. and calculate one or more aggregated values over the grouped rows. If `BY` is
  27. omitted, the output table contains exactly one row with the aggregations applied
  28. over the entire dataset.
  29. The following <<esql-agg-functions,aggregation functions>> are supported:
  30. include::../functions/aggregation-functions.asciidoc[tag=agg_list]
  31. The following <<esql-group-functions,grouping functions>> are supported:
  32. include::../functions/grouping-functions.asciidoc[tag=group_list]
  33. NOTE: `STATS` without any groups is much much faster than adding a group.
  34. NOTE: Grouping on a single expression is currently much more optimized than grouping
  35. on many expressions. In some tests we have seen grouping on a single `keyword`
  36. column to be five times faster than grouping on two `keyword` columns. Do
  37. not try to work around this by combining the two columns together with
  38. something like <<esql-concat>> and then grouping - that is not going to be
  39. faster.
  40. *Examples*
  41. Calculating a statistic and grouping by the values of another column:
  42. [source.merge.styled,esql]
  43. ----
  44. include::{esql-specs}/stats.csv-spec[tag=stats]
  45. ----
  46. [%header.monospaced.styled,format=dsv,separator=|]
  47. |===
  48. include::{esql-specs}/stats.csv-spec[tag=stats-result]
  49. |===
  50. Omitting `BY` returns one row with the aggregations applied over the entire
  51. dataset:
  52. [source.merge.styled,esql]
  53. ----
  54. include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy]
  55. ----
  56. [%header.monospaced.styled,format=dsv,separator=|]
  57. |===
  58. include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy-result]
  59. |===
  60. It's possible to calculate multiple values:
  61. [source.merge.styled,esql]
  62. ----
  63. include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
  64. ----
  65. [%header.monospaced.styled,format=dsv,separator=|]
  66. |===
  67. include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
  68. |===
  69. [[esql-stats-mv-group]]
  70. If the grouping key is multivalued then the input row is in all groups:
  71. [source.merge.styled,esql]
  72. ----
  73. include::{esql-specs}/stats.csv-spec[tag=mv-group]
  74. ----
  75. [%header.monospaced.styled,format=dsv,separator=|]
  76. |===
  77. include::{esql-specs}/stats.csv-spec[tag=mv-group-result]
  78. |===
  79. It's also possible to group by multiple values:
  80. [source,esql]
  81. ----
  82. include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
  83. ----
  84. If the all grouping keys are multivalued then the input row is in all groups:
  85. [source.merge.styled,esql]
  86. ----
  87. include::{esql-specs}/stats.csv-spec[tag=multi-mv-group]
  88. ----
  89. [%header.monospaced.styled,format=dsv,separator=|]
  90. |===
  91. include::{esql-specs}/stats.csv-spec[tag=multi-mv-group-result]
  92. |===
  93. Both the aggregating functions and the grouping expressions accept other
  94. functions. This is useful for using `STATS...BY` on multivalue columns.
  95. For example, to calculate the average salary change, you can use `MV_AVG` to
  96. first average the multiple values per employee, and use the result with the
  97. `AVG` function:
  98. [source.merge.styled,esql]
  99. ----
  100. include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression]
  101. ----
  102. [%header.monospaced.styled,format=dsv,separator=|]
  103. |===
  104. include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression-result]
  105. |===
  106. An example of grouping by an expression is grouping employees on the first
  107. letter of their last name:
  108. [source.merge.styled,esql]
  109. ----
  110. include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression]
  111. ----
  112. [%header.monospaced.styled,format=dsv,separator=|]
  113. |===
  114. include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression-result]
  115. |===
  116. Specifying the output column name is optional. If not specified, the new column
  117. name is equal to the expression. The following query returns a column named
  118. `AVG(salary)`:
  119. [source.merge.styled,esql]
  120. ----
  121. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn]
  122. ----
  123. [%header.monospaced.styled,format=dsv,separator=|]
  124. |===
  125. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn-result]
  126. |===
  127. Because this name contains special characters, <<esql-identifiers,it needs to be
  128. quoted>> with backticks (+{backtick}+) when using it in subsequent commands:
  129. [source.merge.styled,esql]
  130. ----
  131. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval]
  132. ----
  133. [%header.monospaced.styled,format=dsv,separator=|]
  134. |===
  135. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval-result]
  136. |===