index.asciidoc 7.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[sql-lexical-structure]]
  4. == Lexical Structure
  5. This section covers the major lexical structure of SQL, which for the most part, is going to resemble that of ANSI SQL itself hence why low-levels details are not discussed in depth.
  6. {es-sql} currently accepts only one _command_ at a time. A command is a sequence of _tokens_ terminated by the end of input stream.
  7. A token can be a __key word__, an _identifier_ (_quoted_ or _unquoted_), a _literal_ (or constant) or a special character symbol (typically a delimiter). Tokens are typically separated by whitespace (be it space, tab) though in some cases, where there is no ambiguity (typically due to a character symbol) this is not needed - however for readability purposes this should be avoided.
  8. [[sql-syntax-keywords]]
  9. [float]
  10. === Key Words
  11. Take the following example:
  12. [source, sql]
  13. ----
  14. SELECT * FROM table
  15. ----
  16. This query has four tokens: `SELECT`, `*`, `FROM` and `table`. The first three, namely `SELECT`, `*` and `FROM` are __key words__ meaning words that have a fixed meaning in SQL. The token `table` is an _identifier_ meaning it identifies (by name) an entity inside SQL such as a table (in this case), a column, etc...
  17. As one can see, both key words and identifiers have the _same_ lexical structure and thus one cannot know whether a token is one or the other without knowing the SQL language; the complete list of key words is available in the <<sql-syntax-reserved, reserved appendix>>.
  18. Do note that key words are case-insensitive meaning the previous example can be written as:
  19. [source, sql]
  20. ----
  21. select * fRoM table;
  22. ----
  23. Identifiers however are not - as {es} is case sensitive, {es-sql} uses the received value verbatim.
  24. To help differentiate between the two, through-out the documentation the SQL key words are upper-cased a convention we find increases readability and thus recommend to others.
  25. [[sql-syntax-identifiers]]
  26. [float]
  27. === Identifiers
  28. Identifiers can be of two types: __quoted__ and __unquoted__:
  29. [source, sql]
  30. ----
  31. SELECT ip_address FROM "hosts-*"
  32. ----
  33. This query has two identifiers, `ip_address` and `hosts-*` (an <<multi-index,index pattern>>). As `ip_address` does not clash with any key words it can be used verbatim, `hosts-*` on the other hand cannot as it clashes with `-` (minus operation) and `*` hence the double quotes.
  34. Another example:
  35. [source, sql]
  36. ----
  37. SELECT "from" FROM "<logstash-{now/d}>"
  38. ----
  39. The first identifier from needs to quoted as otherwise it clashes with the `FROM` key word (which is case insensitive as thus can be written as `from`) while the second identifier using {es} <<date-math-index-names>> would have otherwise confuse the parser.
  40. Hence why in general, *especially* when dealing with user input it is *highly* recommended to use quotes for identifiers. It adds minimal increase to your queries and in return offers clarity and disambiguation.
  41. [[sql-syntax-literals]]
  42. [float]
  43. === Literals (Constants)
  44. {es-sql} supports two kind of __implicitly-typed__ literals: strings and numbers.
  45. [[sql-syntax-string-literals]]
  46. [float]
  47. ==== String Literals
  48. A string literal is an arbitrary number of characters bounded by single quotes `'`: `'Giant Robot'`.
  49. To include a single quote in the string, escape it using another single quote: `'Captain EO''s Voyage'`.
  50. NOTE: An escaped single quote is *not* a double quote (`"`), but a single quote `'` _repeated_ (`''`).
  51. [sql-syntax-numeric-literals]
  52. [float]
  53. ==== Numeric Literals
  54. Numeric literals are accepted both in decimal and scientific notation with exponent marker (`e` or `E`), starting either with a digit or decimal point `.`:
  55. [source, sql]
  56. ----
  57. 1969 -- integer notation
  58. 3.14 -- decimal notation
  59. .1234 -- decimal notation starting with decimal point
  60. 4E5 -- scientific notation (with exponent marker)
  61. 1.2e-3 -- scientific notation with decimal point
  62. ----
  63. Numeric literals that contain a decimal point are always interpreted as being of type `double`. Those without are considered `integer` if they fit otherwise their type is `long` (or `BIGINT` in ANSI SQL types).
  64. [[sql-syntax-generic-literals]]
  65. [float]
  66. ==== Generic Literals
  67. When dealing with arbitrary type literal, one creates the object by casting, typically, the string representation to the desired type. This can be achieved through the dedicated <<sql-operators-cast, cast operator>> and <<sql-functions-type-conversion, functions>>:
  68. [source, sql]
  69. ----
  70. 123::LONG -- cast 123 to a LONG
  71. CAST('1969-05-13T12:34:56' AS TIMESTAMP) -- cast the given string to datetime
  72. CONVERT('10.0.0.1', IP) -- cast '10.0.0.1' to an IP
  73. ----
  74. Do note that {es-sql} provides functions that out of the box return popular literals (like `E()`) or provide dedicated parsing for certain strings.
  75. [[sql-syntax-single-vs-double-quotes]]
  76. [float]
  77. === Single vs Double Quotes
  78. It is worth pointing out that in SQL, single quotes `'` and double quotes `"` have different meaning and *cannot* be used interchangeably.
  79. Single quotes are used to declare a <<sql-syntax-string-literals, string literal>> while double quotes for <<sql-syntax-identifiers, identifiers>>.
  80. To wit:
  81. [source, sql]
  82. ----
  83. SELECT "first_name" <1>
  84. FROM "musicians" <1>
  85. WHERE "last_name" <1>
  86. = 'Carroll' <2>
  87. ----
  88. <1> Double quotes `"` used for column and table identifiers
  89. <2> Single quotes `'` used for a string literal
  90. [[sql-syntax-special-chars]]
  91. [float]
  92. === Special characters
  93. A few characters that are not alphanumeric have a dedicated meaning different from that of an operator. For completeness these are specified below:
  94. [cols="^m,^15"]
  95. |===
  96. s|Char
  97. s|Description
  98. |* | The asterisk (or wildcard) is used in some contexts to denote all fields for a table. Can be also used as an argument to some aggregate functions.
  99. |, | Commas are used to enumerate the elements of a list.
  100. |. | Used in numeric constants or to separate identifiers qualifiers (catalog, table, column names, etc...).
  101. |()| Parentheses are used for specific SQL commands, function declarations or to enforce precedence.
  102. |===
  103. [[sql-syntax-operators]]
  104. [float]
  105. === Operators
  106. Most operators in {es-sql} have the same precedence and are left-associative. As this is done at parsing time, parenthesis need to be used to enforce a different precedence.
  107. The following table indicates the supported operators and their precendence (highest to lowest);
  108. [cols="^2m,^,^3"]
  109. |===
  110. s|Operator/Element
  111. s|Associativity
  112. s|Description
  113. |.
  114. |left
  115. |qualifier separator
  116. |::
  117. |left
  118. |PostgreSQL-style type cast
  119. |+ -
  120. |right
  121. |unary plus and minus (numeric literal sign)
  122. |* / %
  123. |left
  124. |multiplication, division, modulo
  125. |+ -
  126. |left
  127. |addition, substraction
  128. |BETWEEN IN LIKE
  129. |
  130. |range containment, string matching
  131. |< > <= >= = <=> <> !=
  132. |
  133. |comparison
  134. |NOT
  135. |right
  136. |logical negation
  137. |AND
  138. |left
  139. |logical conjunction
  140. |OR
  141. |left
  142. |logical disjunction
  143. |===
  144. [[sql-syntax-comments]]
  145. [float]
  146. === Comments
  147. {es-sql} allows comments which are sequence of characters ignored by the parsers.
  148. Two styles are supported:
  149. Single Line:: Comments start with a double dash `--` and continue until the end of the line.
  150. Multi line:: Comments that start with `/*` and end with `*/` (also known as C-style).
  151. [source, sql]
  152. ----
  153. -- single line comment
  154. /* multi
  155. line
  156. comment
  157. that supports /* nested comments */
  158. */
  159. ----