|
@@ -0,0 +1,228 @@
|
|
|
+[role="xpack"]
|
|
|
+[testenv="basic"]
|
|
|
+[[sql-lexical-structure]]
|
|
|
+== Lexical Structure
|
|
|
+
|
|
|
+This section covers the major lexical structure of SQL, which for the most part, is going to resemble that of ANSI SQL itself hence why low-levels details are not discussed in depth.
|
|
|
+
|
|
|
+{es-sql} currently accepts only one _command_ at a time. A command is a sequence of _tokens_ terminated by the end of input stream.
|
|
|
+
|
|
|
+A token can be a __key word__, an _identifier_ (_quoted_ or _unquoted_), a _literal_ (or constant) or a special character symbol (typically a delimiter). Tokens are typically separated by whitespace (be it space, tab) though in some cases, where there is no ambiguity (typically due to a character symbol) this is not needed - however for readability purposes this should be avoided.
|
|
|
+
|
|
|
+[[sql-syntax-keywords]]
|
|
|
+[float]
|
|
|
+=== Key Words
|
|
|
+
|
|
|
+Take the following example:
|
|
|
+
|
|
|
+[source, sql]
|
|
|
+----
|
|
|
+SELECT * FROM table
|
|
|
+----
|
|
|
+
|
|
|
+This query has four tokens: `SELECT`, `\*`, `FROM` and `table`. The first three, namely `SELECT`, `*` and `FROM` are __key words__ meaning words that have a fixed meaning in SQL. The token `table` is an _identifier_ meaning it identifies (by name) an entity inside SQL such as a table (in this case), a column, etc...
|
|
|
+
|
|
|
+As one can see, both key words and identifiers have the _same_ lexical structure and thus one cannot know whether a token is one or the other without knowing the SQL language; the complete list of key words is available in the <<sql-syntax-reserved, reserved appendix>>.
|
|
|
+Do note that key words are case-insensitive meaning the previous example can be written as:
|
|
|
+
|
|
|
+[source, sql]
|
|
|
+----
|
|
|
+select * fRoM table;
|
|
|
+----
|
|
|
+
|
|
|
+Identifiers however are not - as {es} is case sensitive, {es-sql} uses the received value verbatim.
|
|
|
+
|
|
|
+To help differentiate between the two, through-out the documentation the SQL key words are upper-cased a convention we find increases readability and thus recommend to others.
|
|
|
+
|
|
|
+[[sql-syntax-identifiers]]
|
|
|
+[float]
|
|
|
+=== Identifiers
|
|
|
+
|
|
|
+Identifiers can be of two types: __quoted__ and __unquoted__:
|
|
|
+
|
|
|
+[source, sql]
|
|
|
+----
|
|
|
+SELECT ip_address FROM "hosts-*"
|
|
|
+----
|
|
|
+
|
|
|
+This query has two identifiers, `ip_address` and `hosts-\*` (an <<multi-index,index pattern>>). As `ip_address` does not clash with any key words it can be used verbatim, `hosts-*` on the other hand cannot as it clashes with `-` (minus operation) and `*` hence the double quotes.
|
|
|
+
|
|
|
+Another example:
|
|
|
+
|
|
|
+[source, sql]
|
|
|
+----
|
|
|
+SELECT "from" FROM "<logstash-{now/d}>"
|
|
|
+----
|
|
|
+
|
|
|
+The first identifier from needs to quoted as otherwise it clashes with the `FROM` key word (which is case insensitive as thus can be written as `from`) while the second identifier using {es} <<date-math-index-names>> would have otherwise confuse the parser.
|
|
|
+
|
|
|
+Hence why in general, *especially* when dealing with user input it is *highly* recommended to use quotes for identifiers. It adds minimal increase to your queries and in return offers clarity and disambiguation.
|
|
|
+
|
|
|
+[[sql-syntax-literals]]
|
|
|
+[float]
|
|
|
+=== Literals (Constants)
|
|
|
+
|
|
|
+{es-sql} supports two kind of __implicitly-typed__ literals: strings and numbers.
|
|
|
+
|
|
|
+[[sql-syntax-string-literals]]
|
|
|
+[float]
|
|
|
+==== String Literals
|
|
|
+
|
|
|
+A string literal is an arbitrary number of characters bounded by single quotes `'`: `'Giant Robot'`.
|
|
|
+To include a single quote in the string, escape it using another single quote: `'Captain EO''s Voyage'`.
|
|
|
+
|
|
|
+NOTE: An escaped single quote is *not* a double quote (`"`), but a single quote `'` _repeated_ (`''`).
|
|
|
+
|
|
|
+[sql-syntax-numeric-literals]
|
|
|
+[float]
|
|
|
+==== Numeric Literals
|
|
|
+
|
|
|
+Numeric literals are accepted both in decimal and scientific notation with exponent marker (`e` or `E`), starting either with a digit or decimal point `.`:
|
|
|
+
|
|
|
+[source, sql]
|
|
|
+----
|
|
|
+1969 -- integer notation
|
|
|
+3.14 -- decimal notation
|
|
|
+.1234 -- decimal notation starting with decimal point
|
|
|
+4E5 -- scientific notation (with exponent marker)
|
|
|
+1.2e-3 -- scientific notation with decimal point
|
|
|
+----
|
|
|
+
|
|
|
+Numeric literals that contain a decimal point are always interpreted as being of type `double`. Those without are considered `integer` if they fit otherwise their type is `long` (or `BIGINT` in ANSI SQL types).
|
|
|
+
|
|
|
+[[sql-syntax-generic-literals]]
|
|
|
+[float]
|
|
|
+==== Generic Literals
|
|
|
+
|
|
|
+When dealing with arbitrary type literal, one creates the object by casting, typically, the string representation to the desired type. This can be achieved through the dedicated <<sql-operators-cast, cast operator>> and <<sql-functions-type-conversion, functions>>:
|
|
|
+
|
|
|
+[source, sql]
|
|
|
+----
|
|
|
+123::LONG -- cast 123 to a LONG
|
|
|
+CAST('1969-05-13T12:34:56' AS TIMESTAMP) -- cast the given string to datetime
|
|
|
+CONVERT('10.0.0.1', IP) -- cast '10.0.0.1' to an IP
|
|
|
+----
|
|
|
+
|
|
|
+Do note that {es-sql} provides functions that out of the box return popular literals (like `E()`) or provide dedicated parsing for certain strings.
|
|
|
+
|
|
|
+[[sql-syntax-single-vs-double-quotes]]
|
|
|
+[float]
|
|
|
+=== Single vs Double Quotes
|
|
|
+
|
|
|
+It is worth pointing out that in SQL, single quotes `'` and double quotes `"` have different meaning and *cannot* be used interchangeably.
|
|
|
+Single quotes are used to declare a <<sql-syntax-string-literals, string literal>> while double quotes for <<sql-syntax-identifiers, identifiers>>.
|
|
|
+
|
|
|
+To wit:
|
|
|
+
|
|
|
+[source, sql]
|
|
|
+----
|
|
|
+SELECT "first_name" <1>
|
|
|
+ FROM "musicians" <1>
|
|
|
+ WHERE "last_name" <1>
|
|
|
+ = 'Carroll' <2>
|
|
|
+----
|
|
|
+
|
|
|
+<1> Double quotes `"` used for column and table identifiers
|
|
|
+<2> Single quotes `'` used for a string literal
|
|
|
+
|
|
|
+[[sql-syntax-special-chars]]
|
|
|
+[float]
|
|
|
+=== Special characters
|
|
|
+
|
|
|
+A few characters that are not alphanumeric have a dedicated meaning different from that of an operator. For completeness these are specified below:
|
|
|
+
|
|
|
+
|
|
|
+[cols="^m,^15"]
|
|
|
+
|
|
|
+|===
|
|
|
+
|
|
|
+s|Char
|
|
|
+s|Description
|
|
|
+
|
|
|
+|* | The asterisk (or wildcard) is used in some contexts to denote all fields for a table. Can be also used as an argument to some aggregate functions.
|
|
|
+|, | Commas are used to enumerate the elements of a list.
|
|
|
+|. | Used in numeric constants or to separate identifiers qualifiers (catalog, table, column names, etc...).
|
|
|
+|()| Parentheses are used for specific SQL commands, function declarations or to enforce precedence.
|
|
|
+|===
|
|
|
+
|
|
|
+[[sql-syntax-operators]]
|
|
|
+[float]
|
|
|
+=== Operators
|
|
|
+
|
|
|
+Most operators in {es-sql} have the same precedence and are left-associative. As this is done at parsing time, parenthesis need to be used to enforce a different precedence.
|
|
|
+
|
|
|
+The following table indicates the supported operators and their precendence (highest to lowest);
|
|
|
+
|
|
|
+[cols="^2m,^,^3"]
|
|
|
+
|
|
|
+|===
|
|
|
+
|
|
|
+s|Operator/Element
|
|
|
+s|Associativity
|
|
|
+s|Description
|
|
|
+
|
|
|
+|.
|
|
|
+|left
|
|
|
+|qualifier separator
|
|
|
+
|
|
|
+|::
|
|
|
+|left
|
|
|
+|PostgreSQL-style type cast
|
|
|
+
|
|
|
+|+ -
|
|
|
+|right
|
|
|
+|unary plus and minus (numeric literal sign)
|
|
|
+
|
|
|
+|* / %
|
|
|
+|left
|
|
|
+|multiplication, division, modulo
|
|
|
+
|
|
|
+|+ -
|
|
|
+|left
|
|
|
+|addition, substraction
|
|
|
+
|
|
|
+|BETWEEN IN LIKE
|
|
|
+|
|
|
|
+|range containment, string matching
|
|
|
+
|
|
|
+|< > <= >= = <=> <> !=
|
|
|
+|
|
|
|
+|comparison
|
|
|
+
|
|
|
+|NOT
|
|
|
+|right
|
|
|
+|logical negation
|
|
|
+
|
|
|
+|AND
|
|
|
+|left
|
|
|
+|logical conjunction
|
|
|
+
|
|
|
+|OR
|
|
|
+|left
|
|
|
+|logical disjunction
|
|
|
+
|
|
|
+|===
|
|
|
+
|
|
|
+
|
|
|
+[[sql-syntax-comments]]
|
|
|
+[float]
|
|
|
+=== Comments
|
|
|
+
|
|
|
+{es-sql} allows comments which are sequence of characters ignored by the parsers.
|
|
|
+
|
|
|
+Two styles are supported:
|
|
|
+
|
|
|
+Single Line:: Comments start with a double dash `--` and continue until the end of the line.
|
|
|
+Multi line:: Comments that start with `/\*` and end with `*/` (also known as C-style).
|
|
|
+
|
|
|
+
|
|
|
+[source, sql]
|
|
|
+----
|
|
|
+-- single line comment
|
|
|
+/* multi
|
|
|
+ line
|
|
|
+ comment
|
|
|
+ that supports /* nested comments */
|
|
|
+ */
|
|
|
+----
|
|
|
+
|