12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091 |
- [role="xpack"]
- [testenv="basic"]
- [[sql-functions-grouping]]
- === Grouping Functions
- Functions for creating special __grouping__s (also known as _bucketing_); as such these need to be used
- as part of the <<sql-syntax-group-by, grouping>>.
- [[sql-functions-grouping-histogram]]
- ==== `HISTOGRAM`
- .Synopsis:
- [source, sql]
- ----
- HISTOGRAM(
- numeric_exp, <1>
- numeric_interval) <2>
- HISTOGRAM(
- date_exp, <3>
- date_time_interval) <4>
- ----
- *Input*:
- <1> numeric expression (typically a field)
- <2> numeric interval
- <3> date/time expression (typically a field)
- <4> date/time <<sql-functions-datetime-interval, interval>>
- *Output*: non-empty buckets or groups of the given expression divided according to the given interval
- .Description
- The histogram function takes all matching values and divides them into buckets with fixed size matching the given interval, using (roughly) the following formula:
- [source, sql]
- ----
- bucket_key = Math.floor(value / interval) * interval
- ----
- NOTE:: The histogram in SQL does *NOT* return empty buckets for missing intervals as the traditional <<search-aggregations-bucket-histogram-aggregation, histogram>> and <<search-aggregations-bucket-datehistogram-aggregation, date histogram>>. Such behavior does not fit conceptually in SQL which treats all missing values as `NULL`; as such the histogram places all missing values in the `NULL` group.
- `Histogram` can be applied on either numeric fields:
- [source, sql]
- ----
- include-tagged::{sql-specs}/docs/docs.csv-spec[histogramNumeric]
- ----
- or date/time fields:
- [source, sql]
- ----
- include-tagged::{sql-specs}/docs/docs.csv-spec[histogramDateTime]
- ----
- Expressions inside the histogram are also supported as long as the
- return type is numeric:
- [source, sql]
- ----
- include-tagged::{sql-specs}/docs/docs.csv-spec[histogramNumericExpression]
- ----
- Do note that histograms (and grouping functions in general) allow custom expressions but cannot have any functions applied to them in the `GROUP BY`. In other words, the following statement is *NOT* allowed:
- [source, sql]
- ----
- include-tagged::{sql-specs}/docs/docs.csv-spec[expressionOnHistogramNotAllowed]
- ----
- as it requires two groupings (one for histogram followed by a second for applying the function on top of the histogram groups).
- Instead one can rewrite the query to move the expression on the histogram _inside_ of it:
- [source, sql]
- ----
- include-tagged::{sql-specs}/docs/docs.csv-spec[histogramDateTimeExpression]
- ----
- [IMPORTANT]
- When the histogram in SQL is applied on **DATE** type instead of **DATETIME**, the interval specified is truncated to
- the multiple of a day. E.g.: for `HISTOGRAM(CAST(birth_date AS DATE), INTERVAL '2 3:04' DAY TO MINUTE)` the interval
- actually used will be `INTERVAL '2' DAY`. If the interval specified is less than 1 day, e.g.:
- `HISTOGRAM(CAST(birth_date AS DATE), INTERVAL '20' HOUR)` then the interval used will be `INTERVAL '1' DAY`.
- [IMPORTANT]
- Histogram in SQL cannot be applied applied on **TIME** type.
- E.g.: `HISTOGRAM(CAST(birth_date AS TIME), INTERVAL '10' MINUTES)` is currently not supported.
|