stats.asciidoc 6.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202
  1. [discrete]
  2. [[esql-stats-by]]
  3. === `STATS`
  4. The `STATS` processing command groups rows according to a common value
  5. and calculates one or more aggregated values over the grouped rows.
  6. **Syntax**
  7. [source,esql]
  8. ----
  9. STATS [column1 =] expression1 [WHERE boolean_expression1][,
  10. ...,
  11. [columnN =] expressionN [WHERE boolean_expressionN]]
  12. [BY grouping_expression1[, ..., grouping_expressionN]]
  13. ----
  14. *Parameters*
  15. `columnX`::
  16. The name by which the aggregated value is returned. If omitted, the name is
  17. equal to the corresponding expression (`expressionX`).
  18. If multiple columns have the same name, all but the rightmost column with this
  19. name will be ignored.
  20. `expressionX`::
  21. An expression that computes an aggregated value.
  22. `grouping_expressionX`::
  23. An expression that outputs the values to group by.
  24. If its name coincides with one of the computed columns, that column will be ignored.
  25. `boolean_expressionX`::
  26. The condition that must be met for a row to be included in the evaluation of `expressionX`.
  27. NOTE: Individual `null` values are skipped when computing aggregations.
  28. *Description*
  29. The `STATS` processing command groups rows according to a common value
  30. and calculates one or more aggregated values over the grouped rows. For the
  31. calculation of each aggregated value, the rows in a group can be filtered with
  32. `WHERE`. If `BY` is omitted, the output table contains exactly one row with
  33. the aggregations applied over the entire dataset.
  34. The following <<esql-agg-functions,aggregation functions>> are supported:
  35. include::../functions/aggregation-functions.asciidoc[tag=agg_list]
  36. The following <<esql-group-functions,grouping functions>> are supported:
  37. include::../functions/grouping-functions.asciidoc[tag=group_list]
  38. NOTE: `STATS` without any groups is much much faster than adding a group.
  39. NOTE: Grouping on a single expression is currently much more optimized than grouping
  40. on many expressions. In some tests we have seen grouping on a single `keyword`
  41. column to be five times faster than grouping on two `keyword` columns. Do
  42. not try to work around this by combining the two columns together with
  43. something like <<esql-concat>> and then grouping - that is not going to be
  44. faster.
  45. *Examples*
  46. Calculating a statistic and grouping by the values of another column:
  47. [source.merge.styled,esql]
  48. ----
  49. include::{esql-specs}/stats.csv-spec[tag=stats]
  50. ----
  51. [%header.monospaced.styled,format=dsv,separator=|]
  52. |===
  53. include::{esql-specs}/stats.csv-spec[tag=stats-result]
  54. |===
  55. Omitting `BY` returns one row with the aggregations applied over the entire
  56. dataset:
  57. [source.merge.styled,esql]
  58. ----
  59. include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy]
  60. ----
  61. [%header.monospaced.styled,format=dsv,separator=|]
  62. |===
  63. include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy-result]
  64. |===
  65. It's possible to calculate multiple values:
  66. [source.merge.styled,esql]
  67. ----
  68. include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
  69. ----
  70. [%header.monospaced.styled,format=dsv,separator=|]
  71. |===
  72. include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
  73. |===
  74. To filter the rows that go into an aggregation, use the `WHERE` clause:
  75. [source.merge.styled,esql]
  76. ----
  77. include::{esql-specs}/stats.csv-spec[tag=aggFiltering]
  78. ----
  79. [%header.monospaced.styled,format=dsv,separator=|]
  80. |===
  81. include::{esql-specs}/stats.csv-spec[tag=aggFiltering-result]
  82. |===
  83. The aggregations can be mixed, with and without a filter and grouping is
  84. optional as well:
  85. [source.merge.styled,esql]
  86. ----
  87. include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup]
  88. ----
  89. [%header.monospaced.styled,format=dsv,separator=|]
  90. |===
  91. include::{esql-specs}/stats.csv-spec[tag=aggFilteringNoGroup-result]
  92. |===
  93. [[esql-stats-mv-group]]
  94. If the grouping key is multivalued then the input row is in all groups:
  95. [source.merge.styled,esql]
  96. ----
  97. include::{esql-specs}/stats.csv-spec[tag=mv-group]
  98. ----
  99. [%header.monospaced.styled,format=dsv,separator=|]
  100. |===
  101. include::{esql-specs}/stats.csv-spec[tag=mv-group-result]
  102. |===
  103. It's also possible to group by multiple values:
  104. [source,esql]
  105. ----
  106. include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
  107. ----
  108. If all the grouping keys are multivalued then the input row is in all groups:
  109. [source.merge.styled,esql]
  110. ----
  111. include::{esql-specs}/stats.csv-spec[tag=multi-mv-group]
  112. ----
  113. [%header.monospaced.styled,format=dsv,separator=|]
  114. |===
  115. include::{esql-specs}/stats.csv-spec[tag=multi-mv-group-result]
  116. |===
  117. Both the aggregating functions and the grouping expressions accept other
  118. functions. This is useful for using `STATS` on multivalue columns.
  119. For example, to calculate the average salary change, you can use `MV_AVG` to
  120. first average the multiple values per employee, and use the result with the
  121. `AVG` function:
  122. [source.merge.styled,esql]
  123. ----
  124. include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression]
  125. ----
  126. [%header.monospaced.styled,format=dsv,separator=|]
  127. |===
  128. include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression-result]
  129. |===
  130. An example of grouping by an expression is grouping employees on the first
  131. letter of their last name:
  132. [source.merge.styled,esql]
  133. ----
  134. include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression]
  135. ----
  136. [%header.monospaced.styled,format=dsv,separator=|]
  137. |===
  138. include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression-result]
  139. |===
  140. Specifying the output column name is optional. If not specified, the new column
  141. name is equal to the expression. The following query returns a column named
  142. `AVG(salary)`:
  143. [source.merge.styled,esql]
  144. ----
  145. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn]
  146. ----
  147. [%header.monospaced.styled,format=dsv,separator=|]
  148. |===
  149. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn-result]
  150. |===
  151. Because this name contains special characters, <<esql-identifiers,it needs to be
  152. quoted>> with backticks (+{backtick}+) when using it in subsequent commands:
  153. [source.merge.styled,esql]
  154. ----
  155. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval]
  156. ----
  157. [%header.monospaced.styled,format=dsv,separator=|]
  158. |===
  159. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval-result]
  160. |===