123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141 |
- [discrete]
- [[esql-stats-by]]
- === `STATS ... BY`
- **Syntax**
- [source,esql]
- ----
- STATS [column1 =] expression1[, ..., [columnN =] expressionN]
- [BY grouping_expression1[, ..., grouping_expressionN]]
- ----
- *Parameters*
- `columnX`::
- The name by which the aggregated value is returned. If omitted, the name is
- equal to the corresponding expression (`expressionX`).
- `expressionX`::
- An expression that computes an aggregated value.
- `grouping_expressionX`::
- An expression that outputs the values to group by.
- NOTE: Individual `null` values are skipped when computing aggregations.
- *Description*
- The `STATS ... BY` processing command groups rows according to a common value
- and calculate one or more aggregated values over the grouped rows. If `BY` is
- omitted, the output table contains exactly one row with the aggregations applied
- over the entire dataset.
- The following <<esql-agg-functions,aggregation functions>> are supported:
- include::../functions/aggregation-functions.asciidoc[tag=agg_list]
- NOTE: `STATS` without any groups is much much faster than adding a group.
- NOTE: Grouping on a single expression is currently much more optimized than grouping
- on many expressions. In some tests we have seen grouping on a single `keyword`
- column to be five times faster than grouping on two `keyword` columns. Do
- not try to work around this by combining the two columns together with
- something like <<esql-concat>> and then grouping - that is not going to be
- faster.
- *Examples*
- Calculating a statistic and grouping by the values of another column:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/stats.csv-spec[tag=stats]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/stats.csv-spec[tag=stats-result]
- |===
- Omitting `BY` returns one row with the aggregations applied over the entire
- dataset:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy-result]
- |===
- It's possible to calculate multiple values:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
- |===
- It's also possible to group by multiple values (only supported for long and
- keyword family fields):
- [source,esql]
- ----
- include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
- ----
- Both the aggregating functions and the grouping expressions accept other
- functions. This is useful for using `STATS...BY` on multivalue columns.
- For example, to calculate the average salary change, you can use `MV_AVG` to
- first average the multiple values per employee, and use the result with the
- `AVG` function:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/stats.csv-spec[tag=docsStatsAvgNestedExpression-result]
- |===
- An example of grouping by an expression is grouping employees on the first
- letter of their last name:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/stats.csv-spec[tag=docsStatsByExpression-result]
- |===
- Specifying the output column name is optional. If not specified, the new column
- name is equal to the expression. The following query returns a column named
- `AVG(salary)`:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn-result]
- |===
- Because this name contains special characters, <<esql-identifiers,it needs to be
- quoted>> with backticks (+{backtick}+) when using it in subsequent commands:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval-result]
- |===
|