lqb
/
elasticsearch
mirror of https://gitee.com/mirrors/elasticsearch.git


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
							[discrete]
[[esql-stats-by]]
=== `STATS ... BY`

**Syntax**

[source,esql]
----
STATS [column1 =] expression1[, ..., [columnN =] expressionN] [BY grouping_column1[, ..., grouping_columnN]]
----

*Parameters*

`columnX`::
The name by which the aggregated value is returned. If omitted, the name is
equal to the corresponding expression (`expressionX`).

`expressionX`::
An expression that computes an aggregated value.

`grouping_columnX`::
The column containing the values to group by.

*Description*

The `STATS ... BY` processing command groups rows according to a common value
and calculate one or more aggregated values over the grouped rows. If `BY` is
omitted, the output table contains exactly one row with the aggregations applied
over the entire dataset.

The following aggregation functions are supported:

include::../functions/aggregation-functions.asciidoc[tag=agg_list]

NOTE: `STATS` without any groups is much much faster than adding a group.

NOTE: Grouping on a single column is currently much more optimized than grouping
      on many columns. In some tests we have seen grouping on a single `keyword`
      column to be five times faster than grouping on two `keyword` columns. Do 
      not try to work around this by combining the two columns together with 
      something like <<esql-concat>> and then grouping - that is not going to be
      faster.

*Examples*

Calculating a statistic and grouping by the values of another column:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=stats]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=stats-result]
|===

Omitting `BY` returns one row with the aggregations applied over the entire
dataset:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=statsWithoutBy-result]
|===

It's possible to calculate multiple values:

[source,esql]
----
include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
----

It's also possible to group by multiple values (only supported for long and
keyword family fields):

[source,esql]
----
include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
----

Specifying the output column name is optional. If not specified, the new column
name is equal to the expression. The following query returns a column named
`AVG(salary)`:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumn-result]
|===

Because this name contains special characters, <<esql-identifiers,it needs to be
quoted>> with backticks (+{backtick}+) when using it in subsequent commands:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=statsUnnamedColumnEval-result]
|===