stats.asciidoc 4.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
  1. [discrete]
  2. [[esql-stats-by]]
  3. === `STATS ... BY`
  4. **Syntax**
  5. [source,esql]
  6. ----
  7. STATS [column1 =] expression1[, ..., [columnN =] expressionN]
  8. [BY grouping_expression1[, ..., grouping_expressionN]]
  9. ----
  10. *Parameters*
  11. `columnX`::
  12. The name by which the aggregated value is returned. If omitted, the name is
  13. equal to the corresponding expression (`expressionX`).
  14. `expressionX`::
  15. An expression that computes an aggregated value.
  16. `grouping_expressionX`::
  17. An expression that outputs the values to group by.
  18. NOTE: Individual `null` values are skipped when computing aggregations.
  19. *Description*
  20. The `STATS ... BY` processing command groups rows according to a common value
  21. and calculate one or more aggregated values over the grouped rows. If `BY` is
  22. omitted, the output table contains exactly one row with the aggregations applied
  23. over the entire dataset.
  24. The following <<esql-agg-functions,aggregation functions>> are supported:
  25. include::../functions/aggregation-functions.asciidoc[tag=agg_list]
  26. NOTE: `STATS` without any groups is much much faster than adding a group.
  27. NOTE: Grouping on a single expression is currently much more optimized than grouping
  28. on many expressions. In some tests we have seen grouping on a single `keyword`
  29. column to be five times faster than grouping on two `keyword` columns. Do
  30. not try to work around this by combining the two columns together with
  31. something like <<esql-concat>> and then grouping - that is not going to be
  32. faster.
  33. *Examples*
  34. Calculating a statistic and grouping by the values of another column:
  35. [source.merge.styled,esql]
  36. ----
  37. include::{esql-specs}/stats.csv-spec[tag=stats]
  38. ----
  39. [%header.monospaced.styled,format=dsv,separator=|]
  40. |===
  41. include::{esql-specs}/stats.csv-spec[tag=stats-result]
  42. |===
  43. Omitting `BY` returns one row with the aggregations applied over the entire
  44. dataset:
  45. [source.merge.styled,esql]
  46. ----
  47. include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy]
  48. ----
  49. [%header.monospaced.styled,format=dsv,separator=|]
  50. |===
  51. include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy-result]
  52. |===
  53. It's possible to calculate multiple values:
  54. [source.merge.styled,esql]
  55. ----
  56. include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
  57. ----
  58. [%header.monospaced.styled,format=dsv,separator=|]
  59. |===
  60. include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
  61. |===
  62. It's also possible to group by multiple values (only supported for long and
  63. keyword family fields):
  64. [source,esql]
  65. ----
  66. include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
  67. ----
  68. Both the aggregating functions and the grouping expressions accept other
  69. functions. This is useful for using `STATS...BY` on multivalue columns.
  70. For example, to calculate the average salary change, you can use `MV_AVG` to
  71. first average the multiple values per employee, and use the result with the
  72. `AVG` function:
  73. [source.merge.styled,esql]
  74. ----
  75. include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression]
  76. ----
  77. [%header.monospaced.styled,format=dsv,separator=|]
  78. |===
  79. include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression-result]
  80. |===
  81. An example of grouping by an expression is grouping employees on the first
  82. letter of their last name:
  83. [source.merge.styled,esql]
  84. ----
  85. include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression]
  86. ----
  87. [%header.monospaced.styled,format=dsv,separator=|]
  88. |===
  89. include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression-result]
  90. |===
  91. Specifying the output column name is optional. If not specified, the new column
  92. name is equal to the expression. The following query returns a column named
  93. `AVG(salary)`:
  94. [source.merge.styled,esql]
  95. ----
  96. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn]
  97. ----
  98. [%header.monospaced.styled,format=dsv,separator=|]
  99. |===
  100. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn-result]
  101. |===
  102. Because this name contains special characters, <<esql-identifiers,it needs to be
  103. quoted>> with backticks (+{backtick}+) when using it in subsequent commands:
  104. [source.merge.styled,esql]
  105. ----
  106. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval]
  107. ----
  108. [%header.monospaced.styled,format=dsv,separator=|]
  109. |===
  110. include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval-result]
  111. |===