esql-get-started.asciidoc 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342
  1. [[esql-getting-started]]
  2. == Getting started with {esql} queries
  3. ++++
  4. <titleabbrev>Getting started</titleabbrev>
  5. ++++
  6. This guide shows how you can use {esql} to query and aggregate your data.
  7. [TIP]
  8. ====
  9. This getting started is also available as an https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/esql/esql-getting-started.ipynb[interactive Python notebook] in the `elasticsearch-labs` GitHub repository.
  10. ====
  11. [discrete]
  12. [[esql-getting-started-prerequisites]]
  13. === Prerequisites
  14. To follow along with the queries in this guide, you can either set up your own
  15. deployment, or use Elastic's public {esql} demo environment.
  16. include::{es-ref-dir}/tab-widgets/esql/esql-getting-started-widget-sample-data.asciidoc[]
  17. [discrete]
  18. [[esql-getting-started-running-queries]]
  19. === Run an {esql} query
  20. In {kib}, you can use Console or Discover to run {esql} queries:
  21. include::{es-ref-dir}/tab-widgets/esql/esql-getting-started-widget-discover-console.asciidoc[]
  22. [discrete]
  23. [[esql-getting-started-first-query]]
  24. === Your first {esql} query
  25. Each {esql} query starts with a <<esql-source-commands,source command>>. A
  26. source command produces a table, typically with data from {es}.
  27. image::images/esql/source-command.svg[A source command producing a table from {es},align="center"]
  28. The <<esql-from>> source command returns a table with documents from a data
  29. stream, index, or alias. Each row in the resulting table represents a document.
  30. This query returns up to 1000 documents from the `sample_data` index:
  31. [source,esql]
  32. ----
  33. include::{esql-specs}/docs.csv-spec[tag=gs-from]
  34. ----
  35. Each column corresponds to a field, and can be accessed by the name of that
  36. field.
  37. [TIP]
  38. ====
  39. {esql} keywords are case-insensitive. The following query is identical to the
  40. previous one:
  41. [source,esql]
  42. ----
  43. include::{esql-specs}/docs.csv-spec[tag=gs-from-lowercase]
  44. ----
  45. ====
  46. [discrete]
  47. [[esql-getting-started-limit]]
  48. === Processing commands
  49. A source command can be followed by one or more
  50. <<esql-processing-commands,processing commands>>, separated by a pipe character:
  51. `|`. Processing commands change an input table by adding, removing, or changing
  52. rows and columns. Processing commands can perform filtering, projection,
  53. aggregation, and more.
  54. image::images/esql/esql-limit.png[A processing command changing an input table,align="center",width="60%"]
  55. For example, you can use the <<esql-limit>> command to limit the number of rows
  56. that are returned, up to a maximum of 10,000 rows:
  57. [source,esql]
  58. ----
  59. include::{esql-specs}/docs.csv-spec[tag=gs-limit]
  60. ----
  61. [TIP]
  62. ====
  63. For readability, you can put each command on a separate line. However, you don't
  64. have to. The following query is identical to the previous one:
  65. [source,esql]
  66. ----
  67. include::{esql-specs}/docs.csv-spec[tag=gs-limit-one-line]
  68. ----
  69. ====
  70. [discrete]
  71. [[esql-getting-started-sort]]
  72. ==== Sort a table
  73. image::images/esql/esql-sort.png[A processing command sorting an input table,align="center",width="60%"]
  74. Another processing command is the <<esql-sort>> command. By default, the rows
  75. returned by `FROM` don't have a defined sort order. Use the `SORT` command to
  76. sort rows on one or more columns:
  77. [source,esql]
  78. ----
  79. include::{esql-specs}/docs.csv-spec[tag=gs-sort]
  80. ----
  81. [discrete]
  82. [[esql-getting-started-where]]
  83. ==== Query the data
  84. Use the <<esql-where>> command to query the data. For example, to find all
  85. events with a duration longer than 5ms:
  86. [source,esql]
  87. ----
  88. include::{esql-specs}/where.csv-spec[tag=gs-where]
  89. ----
  90. `WHERE` supports several <<esql-operators,operators>>. For example, you can use <<esql-like-operator>> to run a wildcard query against the `message` column:
  91. [source,esql]
  92. ----
  93. include::{esql-specs}/where-like.csv-spec[tag=gs-like]
  94. ----
  95. [discrete]
  96. [[esql-getting-started-more-commands]]
  97. ==== More processing commands
  98. There are many other processing commands, like <<esql-keep>> and <<esql-drop>>
  99. to keep or drop columns, <<esql-enrich>> to enrich a table with data from
  100. indices in {es}, and <<esql-dissect>> and <<esql-grok>> to process data. Refer
  101. to <<esql-processing-commands>> for an overview of all processing commands.
  102. [discrete]
  103. [[esql-getting-started-chaining]]
  104. === Chain processing commands
  105. You can chain processing commands, separated by a pipe character: `|`. Each
  106. processing command works on the output table of the previous command. The result
  107. of a query is the table produced by the final processing command.
  108. image::images/esql/esql-sort-limit.png[Processing commands can be chained,align="center"]
  109. The following example first sorts the table on `@timestamp`, and next limits the
  110. result set to 3 rows:
  111. [source,esql]
  112. ----
  113. include::{esql-specs}/docs.csv-spec[tag=gs-chaining]
  114. ----
  115. NOTE: The order of processing commands is important. First limiting the result
  116. set to 3 rows before sorting those 3 rows would most likely return a result that
  117. is different than this example, where the sorting comes before the limit.
  118. [discrete]
  119. [[esql-getting-started-eval]]
  120. === Compute values
  121. Use the <<esql-eval>> command to append columns to a table, with calculated
  122. values. For example, the following query appends a `duration_ms` column. The
  123. values in the column are computed by dividing `event_duration` by 1,000,000. In
  124. other words: `event_duration` converted from nanoseconds to milliseconds.
  125. [source,esql]
  126. ----
  127. include::{esql-specs}/eval.csv-spec[tag=gs-eval]
  128. ----
  129. `EVAL` supports several <<esql-functions,functions>>. For example, to round a
  130. number to the closest number with the specified number of digits, use the
  131. <<esql-round>> function:
  132. [source,esql]
  133. ----
  134. include::{esql-specs}/eval.csv-spec[tag=gs-round]
  135. ----
  136. [discrete]
  137. [[esql-getting-started-stats]]
  138. === Calculate statistics
  139. {esql} can not only be used to query your data, you can also use it to aggregate
  140. your data. Use the <<esql-stats-by>> command to calculate statistics. For
  141. example, the median duration:
  142. [source,esql]
  143. ----
  144. include::{esql-specs}/stats.csv-spec[tag=gs-stats]
  145. ----
  146. You can calculate multiple stats with one command:
  147. [source,esql]
  148. ----
  149. include::{esql-specs}/stats.csv-spec[tag=gs-two-stats]
  150. ----
  151. Use `BY` to group calculated stats by one or more columns. For example, to
  152. calculate the median duration per client IP:
  153. [source,esql]
  154. ----
  155. include::{esql-specs}/stats.csv-spec[tag=gs-stats-by]
  156. ----
  157. [discrete]
  158. [[esql-getting-started-access-columns]]
  159. === Access columns
  160. You can access columns by their name. If a name contains special characters,
  161. <<esql-identifiers,it needs to be quoted>> with backticks (+{backtick}+).
  162. Assigning an explicit name to a column created by `EVAL` or `STATS` is optional.
  163. If you don't provide a name, the new column name is equal to the function
  164. expression. For example:
  165. [source,esql]
  166. ----
  167. include::{esql-specs}/eval.csv-spec[tag=gs-eval-no-column-name]
  168. ----
  169. In this query, `EVAL` adds a new column named `event_duration/1000000.0`.
  170. Because its name contains special characters, to access this column, quote it
  171. with backticks:
  172. [source,esql]
  173. ----
  174. include::{esql-specs}/eval.csv-spec[tag=gs-eval-stats-backticks]
  175. ----
  176. [discrete]
  177. [[esql-getting-started-histogram]]
  178. === Create a histogram
  179. To track statistics over time, {esql} enables you to create histograms using the
  180. <<esql-bucket>> function. `BUCKET` creates human-friendly bucket sizes
  181. and returns a value for each row that corresponds to the resulting bucket the
  182. row falls into.
  183. Combine `BUCKET` with <<esql-stats-by>> to create a histogram. For example,
  184. to count the number of events per hour:
  185. [source,esql]
  186. ----
  187. include::{esql-specs}/bucket.csv-spec[tag=gs-bucket-stats-by]
  188. ----
  189. Or the median duration per hour:
  190. [source,esql]
  191. ----
  192. include::{esql-specs}/bucket.csv-spec[tag=gs-bucket-stats-by-median]
  193. ----
  194. [discrete]
  195. [[esql-getting-started-enrich]]
  196. === Enrich data
  197. {esql} enables you to <<esql-enrich-data,enrich>> a table with data from indices
  198. in {es}, using the <<esql-enrich>> command.
  199. image::images/esql/esql-enrich.png[align="center"]
  200. Before you can use `ENRICH`, you first need to
  201. <<esql-create-enrich-policy,create>> and <<esql-execute-enrich-policy,execute>>
  202. an <<esql-enrich-policy,enrich policy>>.
  203. include::{es-ref-dir}/tab-widgets/esql/esql-getting-started-widget-enrich-policy.asciidoc[]
  204. After creating and executing a policy, you can use it with the `ENRICH`
  205. command:
  206. [source,esql]
  207. ----
  208. include::{esql-specs}/enrich.csv-spec[tag=gs-enrich]
  209. ----
  210. You can use the new `env` column that's added by the `ENRICH` command in
  211. subsequent commands. For example, to calculate the median duration per
  212. environment:
  213. [source,esql]
  214. ----
  215. include::{esql-specs}/enrich.csv-spec[tag=gs-enrich-stats-by]
  216. ----
  217. For more about data enrichment with {esql}, refer to <<esql-enrich-data>>.
  218. [discrete]
  219. [[esql-getting-started-process-data]]
  220. === Process data
  221. Your data may contain unstructured strings that you want to
  222. <<esql-process-data-with-dissect-and-grok,structure>> to make it easier to
  223. analyze the data. For example, the sample data contains log messages like:
  224. [source,txt]
  225. ----
  226. "Connected to 10.1.0.3"
  227. ----
  228. By extracting the IP address from these messages, you can determine which IP has
  229. accepted the most client connections.
  230. To structure unstructured strings at query time, you can use the {esql}
  231. <<esql-dissect>> and <<esql-grok>> commands. `DISSECT` works by breaking up a
  232. string using a delimiter-based pattern. `GROK` works similarly, but uses regular
  233. expressions. This makes `GROK` more powerful, but generally also slower.
  234. In this case, no regular expressions are needed, as the `message` is
  235. straightforward: "Connected to ", followed by the server IP. To match this
  236. string, you can use the following `DISSECT` command:
  237. [source,esql]
  238. ----
  239. include::{esql-specs}/dissect.csv-spec[tag=gs-dissect]
  240. ----
  241. This adds a `server_ip` column to those rows that have a `message` that matches
  242. this pattern. For other rows, the value of `server_ip` is `null`.
  243. You can use the new `server_ip` column that's added by the `DISSECT` command in
  244. subsequent commands. For example, to determine how many connections each server
  245. has accepted:
  246. [source,esql]
  247. ----
  248. include::{esql-specs}/dissect.csv-spec[tag=gs-dissect-stats-by]
  249. ----
  250. For more about data processing with {esql}, refer to
  251. <<esql-process-data-with-dissect-and-grok>>.
  252. [discrete]
  253. [[esql-getting-learn-more]]
  254. === Learn more
  255. To learn more about {esql}, refer to <<esql-language>> and <<esql-using>>.