123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229 |
- [role="xpack"]
- [testenv="basic"]
- [[sql-limitations]]
- == SQL Limitations
- [discrete]
- [[large-parsing-trees]]
- === Large queries may throw `ParsingExpection`
- Extremely large queries can consume too much memory during the parsing phase, in which case the {es-sql} engine will
- abort parsing and throw an error. In such cases, consider reducing the query to a smaller size by potentially
- simplifying it or splitting it into smaller queries.
- [discrete]
- [[sys-columns-describe-table-nested-fields]]
- === Nested fields in `SYS COLUMNS` and `DESCRIBE TABLE`
- {es} has a special type of relationship fields called `nested` fields. In {es-sql} they can be used by referencing their inner
- sub-fields. Even though `SYS COLUMNS` in non-driver mode (in the CLI and in REST calls) and `DESCRIBE TABLE` will still display
- them as having the type `NESTED`, they cannot be used in a query. One can only reference its sub-fields in the form:
- [source, sql]
- --------------------------------------------------
- [nested_field_name].[sub_field_name]
- --------------------------------------------------
- For example:
- [source, sql]
- --------------------------------------------------
- SELECT dep.dep_name.keyword FROM test_emp GROUP BY languages;
- --------------------------------------------------
- [discrete]
- === Scalar functions on nested fields are not allowed in `WHERE` and `ORDER BY` clauses
- {es-sql} doesn't support the usage of scalar functions on top of nested fields in `WHERE`
- and `ORDER BY` clauses with the exception of comparison and logical operators.
- For example:
- [source, sql]
- --------------------------------------------------
- SELECT * FROM test_emp WHERE LENGTH(dep.dep_name.keyword) > 5;
- --------------------------------------------------
- and
- [source, sql]
- --------------------------------------------------
- SELECT * FROM test_emp ORDER BY YEAR(dep.start_date);
- --------------------------------------------------
- are not supported but:
- [source, sql]
- --------------------------------------------------
- SELECT * FROM test_emp WHERE dep.start_date >= CAST('2020-01-01' AS DATE) OR dep.dep_end_date IS NULL;
- --------------------------------------------------
- is supported.
- [discrete]
- === Multi-nested fields
- {es-sql} doesn't support multi-nested documents, so a query cannot reference more than one nested field in an index.
- This applies to multi-level nested fields, but also multiple nested fields defined on the same level. For example, for this index:
- [source, sql]
- ----------------------------------------------------
- column | type | mapping
- ----------------------+---------------+-------------
- nested_A |STRUCT |NESTED
- nested_A.nested_X |STRUCT |NESTED
- nested_A.nested_X.text|VARCHAR |KEYWORD
- nested_A.text |VARCHAR |KEYWORD
- nested_B |STRUCT |NESTED
- nested_B.text |VARCHAR |KEYWORD
- ----------------------------------------------------
- `nested_A` and `nested_B` cannot be used at the same time, nor `nested_A`/`nested_B` and `nested_A.nested_X` combination.
- For such situations, {es-sql} will display an error message.
- [discrete]
- === Paginating nested inner hits
- When SELECTing a nested field, pagination will not work as expected, {es-sql} will return __at least__ the page size records.
- This is because of the way nested queries work in {es}: the root nested field will be returned and it's matching inner nested fields as well,
- pagination taking place on the **root nested document and not on its inner hits**.
- [discrete]
- [[normalized-keyword-fields]]
- === Normalized `keyword` fields
- `keyword` fields in {es} can be normalized by defining a `normalizer`. Such fields are not supported in {es-sql}.
- [discrete]
- === Array type of fields
- Array fields are not supported due to the "invisible" way in which {es} handles an array of values: the mapping doesn't indicate whether
- a field is an array (has multiple values) or not, so without reading all the data, {es-sql} cannot know whether a field is a single or multi value.
- When multiple values are returned for a field, by default, {es-sql} will throw an exception. However, it is possible to change this behavior through `field_multi_value_leniency` parameter in REST (disabled by default) or
- `field.multi.value.leniency` in drivers (enabled by default).
- [discrete]
- === Sorting by aggregation
- When doing aggregations (`GROUP BY`) {es-sql} relies on {es}'s `composite` aggregation for its support for paginating results.
- However this type of aggregation does come with a limitation: sorting can only be applied on the key used for the aggregation's buckets.
- {es-sql} overcomes this limitation by doing client-side sorting however as a safety measure, allows only up to *65535* rows.
- It is recommended to use `LIMIT` for queries that use sorting by aggregation, essentially indicating the top N results that are desired:
- [source, sql]
- --------------------------------------------------
- SELECT * FROM test GROUP BY age ORDER BY COUNT(*) LIMIT 100;
- --------------------------------------------------
- It is possible to run the same queries without a `LIMIT` however in that case if the maximum size (*10000*) is passed,
- an exception will be returned as {es-sql} is unable to track (and sort) all the results returned.
- Moreover, the aggregation(s) used in the `ORDER BY` must be only plain aggregate functions. No scalar
- functions or operators can be used, and therefore no complex columns that combine two ore more aggregate
- functions can be used for ordering. Here are some examples of queries that are *not allowed*:
- [source, sql]
- --------------------------------------------------
- SELECT age, ROUND(AVG(salary)) AS avg FROM test GROUP BY age ORDER BY avg;
- SELECT age, MAX(salary) - MIN(salary) AS diff FROM test GROUP BY age ORDER BY diff;
- --------------------------------------------------
- [discrete]
- === Using a sub-select
- Using sub-selects (`SELECT X FROM (SELECT Y)`) is **supported to a small degree**: any sub-select that can be "flattened" into a single
- `SELECT` is possible with {es-sql}. For example:
- ["source","sql",subs="attributes,macros"]
- --------------------------------------------------
- include-tagged::{sql-specs}/docs/docs.csv-spec[limitationSubSelect]
- --------------------------------------------------
- The query above is possible because it is equivalent with:
- ["source","sql",subs="attributes,macros"]
- --------------------------------------------------
- include-tagged::{sql-specs}/docs/docs.csv-spec[limitationSubSelectRewritten]
- --------------------------------------------------
- But, if the sub-select would include a `GROUP BY` or `HAVING` or the enclosing `SELECT` would be more complex than `SELECT X
- FROM (SELECT ...) WHERE [simple_condition]`, this is currently **un-supported**.
- [discrete]
- [[first-last-agg-functions-having-clause]]
- === Using <<sql-functions-aggs-first, `FIRST`>>/<<sql-functions-aggs-last,`LAST`>> aggregation functions in `HAVING` clause
- Using `FIRST` and `LAST` in the `HAVING` clause is not supported. The same applies to
- <<sql-functions-aggs-min,`MIN`>> and <<sql-functions-aggs-max,`MAX`>> when their target column
- is of type <<keyword, `keyword`>> as they are internally translated to `FIRST` and `LAST`.
- [discrete]
- [[group-by-time]]
- === Using TIME data type in GROUP BY or <<sql-functions-grouping-histogram>>
- Using `TIME` data type as a grouping key is currently not supported. For example:
- [source, sql]
- -------------------------------------------------------------
- SELECT count(*) FROM test GROUP BY CAST(date_created AS TIME);
- -------------------------------------------------------------
- On the other hand, it can still be used if it's wrapped with a scalar function that returns another data type,
- for example:
- [source, sql]
- -------------------------------------------------------------
- SELECT count(*) FROM test GROUP BY MINUTE((CAST(date_created AS TIME));
- -------------------------------------------------------------
- `TIME` data type is also currently not supported in histogram grouping function. For example:
- [source, sql]
- -------------------------------------------------------------
- SELECT HISTOGRAM(CAST(birth_date AS TIME), INTERVAL '10' MINUTES) as h, COUNT(*) FROM t GROUP BY h
- -------------------------------------------------------------
- [discrete]
- [[geo-sql-limitations]]
- === Geo-related functions
- Since `geo_shape` fields don't have doc values these fields cannot be used for filtering, grouping or sorting.
- By default,`geo_points` fields are indexed and have doc values. However only latitude and longitude are stored and
- indexed with some loss of precision from the original values (4.190951585769653E-8 for the latitude and
- 8.381903171539307E-8 for longitude). The altitude component is accepted but not stored in doc values nor indexed.
- Therefore calling `ST_Z` function in the filtering, grouping or sorting will return `null`.
- [discrete]
- [[using-fields-api]]
- === Retrieving using the `fields` search parameter
- {es-sql} retrieves column values using the <<search-fields-param,search API's
- `fields` parameter>>. Any limitations on the `fields` parameter also apply to
- {es-sql} queries. For example, if `_source` is disabled
- for any of the returned fields or at index level, the values cannot be retrieved.
- [discrete]
- [[aggs-in-pivot]]
- === Aggregations in the <<sql-syntax-pivot, `PIVOT`>> clause
- The aggregation expression in <<sql-syntax-pivot, `PIVOT`>> will currently accept only one aggregation. It is thus not possible to obtain multiple aggregations for any one pivoted column.
- [discrete]
- [[subquery-in-pivot]]
- === Using a subquery in <<sql-syntax-pivot, `PIVOT`>>'s `IN`-subclause
- The values that the <<sql-syntax-pivot, `PIVOT`>> query could pivot must be provided in the query as a list of literals; providing a subquery instead to build this list is not currently supported. For example, in this query:
- [source, sql]
- -------------------------------------------------------------
- SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (1, 2))
- -------------------------------------------------------------
- the `languages` of interest must be listed explicitly: `IN (1, 2)`. On the other hand, this example would **not work**:
- [source, sql]
- -------------------------------------------------------------
- SELECT * FROM test_emp PIVOT (SUM(salary) FOR languages IN (SELECT languages FROM test_emp WHERE languages <=2 GROUP BY languages))
- -------------------------------------------------------------
|