esql-get-started.asciidoc 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342
  1. [[esql-getting-started]]
  2. == Getting started with {esql} queries
  3. ++++
  4. <titleabbrev>Getting started</titleabbrev>
  5. ++++
  6. This guide shows how you can use {esql} to query and aggregate your data.
  7. [TIP]
  8. ====
  9. This getting started is also available as an https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/esql/esql-getting-started.ipynb[interactive Python notebook] in the `elasticsearch-labs` GitHub repository.
  10. ====
  11. [discrete]
  12. [[esql-getting-started-prerequisites]]
  13. === Prerequisites
  14. To follow along with the queries in this guide, you'll need an {es} deployment with our sample data.
  15. include::{es-ref-dir}/tab-widgets/esql/esql-getting-started-sample-data.asciidoc[tag=own-deployment]
  16. [discrete]
  17. [[esql-getting-started-running-queries]]
  18. === Run an {esql} query
  19. In {kib}, you can use Console or Discover to run {esql} queries:
  20. include::{es-ref-dir}/tab-widgets/esql/esql-getting-started-widget-discover-console.asciidoc[]
  21. [discrete]
  22. [[esql-getting-started-first-query]]
  23. === Your first {esql} query
  24. Each {esql} query starts with a <<esql-source-commands,source command>>. A
  25. source command produces a table, typically with data from {es}.
  26. image::images/esql/source-command.svg[A source command producing a table from {es},align="center"]
  27. The <<esql-from>> source command returns a table with documents from a data
  28. stream, index, or alias. Each row in the resulting table represents a document.
  29. This query returns up to 1000 documents from the `sample_data` index:
  30. [source,esql]
  31. ----
  32. include::{esql-specs}/docs.csv-spec[tag=gs-from]
  33. ----
  34. Each column corresponds to a field, and can be accessed by the name of that
  35. field.
  36. [TIP]
  37. ====
  38. {esql} keywords are case-insensitive. The following query is identical to the
  39. previous one:
  40. [source,esql]
  41. ----
  42. include::{esql-specs}/docs.csv-spec[tag=gs-from-lowercase]
  43. ----
  44. ====
  45. [discrete]
  46. [[esql-getting-started-limit]]
  47. === Processing commands
  48. A source command can be followed by one or more
  49. <<esql-processing-commands,processing commands>>, separated by a pipe character:
  50. `|`. Processing commands change an input table by adding, removing, or changing
  51. rows and columns. Processing commands can perform filtering, projection,
  52. aggregation, and more.
  53. image::images/esql/esql-limit.png[A processing command changing an input table,align="center",width="60%"]
  54. For example, you can use the <<esql-limit>> command to limit the number of rows
  55. that are returned, up to a maximum of 10,000 rows:
  56. [source,esql]
  57. ----
  58. include::{esql-specs}/docs.csv-spec[tag=gs-limit]
  59. ----
  60. [TIP]
  61. ====
  62. For readability, you can put each command on a separate line. However, you don't
  63. have to. The following query is identical to the previous one:
  64. [source,esql]
  65. ----
  66. include::{esql-specs}/docs.csv-spec[tag=gs-limit-one-line]
  67. ----
  68. ====
  69. [discrete]
  70. [[esql-getting-started-sort]]
  71. ==== Sort a table
  72. image::images/esql/esql-sort.png[A processing command sorting an input table,align="center",width="60%"]
  73. Another processing command is the <<esql-sort>> command. By default, the rows
  74. returned by `FROM` don't have a defined sort order. Use the `SORT` command to
  75. sort rows on one or more columns:
  76. [source,esql]
  77. ----
  78. include::{esql-specs}/docs.csv-spec[tag=gs-sort]
  79. ----
  80. [discrete]
  81. [[esql-getting-started-where]]
  82. ==== Query the data
  83. Use the <<esql-where>> command to query the data. For example, to find all
  84. events with a duration longer than 5ms:
  85. [source,esql]
  86. ----
  87. include::{esql-specs}/where.csv-spec[tag=gs-where]
  88. ----
  89. `WHERE` supports several <<esql-operators,operators>>. For example, you can use <<esql-like-operator>> to run a wildcard query against the `message` column:
  90. [source,esql]
  91. ----
  92. include::{esql-specs}/where-like.csv-spec[tag=gs-like]
  93. ----
  94. [discrete]
  95. [[esql-getting-started-more-commands]]
  96. ==== More processing commands
  97. There are many other processing commands, like <<esql-keep>> and <<esql-drop>>
  98. to keep or drop columns, <<esql-enrich>> to enrich a table with data from
  99. indices in {es}, and <<esql-dissect>> and <<esql-grok>> to process data. Refer
  100. to <<esql-processing-commands>> for an overview of all processing commands.
  101. [discrete]
  102. [[esql-getting-started-chaining]]
  103. === Chain processing commands
  104. You can chain processing commands, separated by a pipe character: `|`. Each
  105. processing command works on the output table of the previous command. The result
  106. of a query is the table produced by the final processing command.
  107. image::images/esql/esql-sort-limit.png[Processing commands can be chained,align="center"]
  108. The following example first sorts the table on `@timestamp`, and next limits the
  109. result set to 3 rows:
  110. [source,esql]
  111. ----
  112. include::{esql-specs}/docs.csv-spec[tag=gs-chaining]
  113. ----
  114. NOTE: The order of processing commands is important. First limiting the result
  115. set to 3 rows before sorting those 3 rows would most likely return a result that
  116. is different than this example, where the sorting comes before the limit.
  117. [discrete]
  118. [[esql-getting-started-eval]]
  119. === Compute values
  120. Use the <<esql-eval>> command to append columns to a table, with calculated
  121. values. For example, the following query appends a `duration_ms` column. The
  122. values in the column are computed by dividing `event_duration` by 1,000,000. In
  123. other words: `event_duration` converted from nanoseconds to milliseconds.
  124. [source,esql]
  125. ----
  126. include::{esql-specs}/eval.csv-spec[tag=gs-eval]
  127. ----
  128. `EVAL` supports several <<esql-functions,functions>>. For example, to round a
  129. number to the closest number with the specified number of digits, use the
  130. <<esql-round>> function:
  131. [source,esql]
  132. ----
  133. include::{esql-specs}/eval.csv-spec[tag=gs-round]
  134. ----
  135. [discrete]
  136. [[esql-getting-started-stats]]
  137. === Calculate statistics
  138. {esql} can not only be used to query your data, you can also use it to aggregate
  139. your data. Use the <<esql-stats-by>> command to calculate statistics. For
  140. example, the median duration:
  141. [source,esql]
  142. ----
  143. include::{esql-specs}/stats.csv-spec[tag=gs-stats]
  144. ----
  145. You can calculate multiple stats with one command:
  146. [source,esql]
  147. ----
  148. include::{esql-specs}/stats.csv-spec[tag=gs-two-stats]
  149. ----
  150. Use `BY` to group calculated stats by one or more columns. For example, to
  151. calculate the median duration per client IP:
  152. [source,esql]
  153. ----
  154. include::{esql-specs}/stats.csv-spec[tag=gs-stats-by]
  155. ----
  156. [discrete]
  157. [[esql-getting-started-access-columns]]
  158. === Access columns
  159. You can access columns by their name. If a name contains special characters,
  160. <<esql-identifiers,it needs to be quoted>> with backticks (+{backtick}+).
  161. Assigning an explicit name to a column created by `EVAL` or `STATS` is optional.
  162. If you don't provide a name, the new column name is equal to the function
  163. expression. For example:
  164. [source,esql]
  165. ----
  166. include::{esql-specs}/eval.csv-spec[tag=gs-eval-no-column-name]
  167. ----
  168. In this query, `EVAL` adds a new column named `event_duration/1000000.0`.
  169. Because its name contains special characters, to access this column, quote it
  170. with backticks:
  171. [source,esql]
  172. ----
  173. include::{esql-specs}/eval.csv-spec[tag=gs-eval-stats-backticks]
  174. ----
  175. [discrete]
  176. [[esql-getting-started-histogram]]
  177. === Create a histogram
  178. To track statistics over time, {esql} enables you to create histograms using the
  179. <<esql-bucket>> function. `BUCKET` creates human-friendly bucket sizes
  180. and returns a value for each row that corresponds to the resulting bucket the
  181. row falls into.
  182. Combine `BUCKET` with <<esql-stats-by>> to create a histogram. For example,
  183. to count the number of events per hour:
  184. [source,esql]
  185. ----
  186. include::{esql-specs}/bucket.csv-spec[tag=gs-bucket-stats-by]
  187. ----
  188. Or the median duration per hour:
  189. [source,esql]
  190. ----
  191. include::{esql-specs}/bucket.csv-spec[tag=gs-bucket-stats-by-median]
  192. ----
  193. [discrete]
  194. [[esql-getting-started-enrich]]
  195. === Enrich data
  196. {esql} enables you to <<esql-enrich-data,enrich>> a table with data from indices
  197. in {es}, using the <<esql-enrich>> command.
  198. image::images/esql/esql-enrich.png[align="center"]
  199. Before you can use `ENRICH`, you first need to
  200. <<esql-create-enrich-policy,create>> and <<esql-execute-enrich-policy,execute>>
  201. an <<esql-enrich-policy,enrich policy>>.
  202. include::{es-ref-dir}/tab-widgets/esql/esql-getting-started-enrich-policy.asciidoc[tag=own-deployment]
  203. After creating and executing a policy, you can use it with the `ENRICH`
  204. command:
  205. [source,esql]
  206. ----
  207. include::{esql-specs}/enrich.csv-spec[tag=gs-enrich]
  208. ----
  209. You can use the new `env` column that's added by the `ENRICH` command in
  210. subsequent commands. For example, to calculate the median duration per
  211. environment:
  212. [source,esql]
  213. ----
  214. include::{esql-specs}/enrich.csv-spec[tag=gs-enrich-stats-by]
  215. ----
  216. For more about data enrichment with {esql}, refer to <<esql-enrich-data>>.
  217. [discrete]
  218. [[esql-getting-started-process-data]]
  219. === Process data
  220. Your data may contain unstructured strings that you want to
  221. <<esql-process-data-with-dissect-and-grok,structure>> to make it easier to
  222. analyze the data. For example, the sample data contains log messages like:
  223. [source,txt]
  224. ----
  225. "Connected to 10.1.0.3"
  226. ----
  227. By extracting the IP address from these messages, you can determine which IP has
  228. accepted the most client connections.
  229. To structure unstructured strings at query time, you can use the {esql}
  230. <<esql-dissect>> and <<esql-grok>> commands. `DISSECT` works by breaking up a
  231. string using a delimiter-based pattern. `GROK` works similarly, but uses regular
  232. expressions. This makes `GROK` more powerful, but generally also slower.
  233. In this case, no regular expressions are needed, as the `message` is
  234. straightforward: "Connected to ", followed by the server IP. To match this
  235. string, you can use the following `DISSECT` command:
  236. [source,esql]
  237. ----
  238. include::{esql-specs}/dissect.csv-spec[tag=gs-dissect]
  239. ----
  240. This adds a `server_ip` column to those rows that have a `message` that matches
  241. this pattern. For other rows, the value of `server_ip` is `null`.
  242. You can use the new `server_ip` column that's added by the `DISSECT` command in
  243. subsequent commands. For example, to determine how many connections each server
  244. has accepted:
  245. [source,esql]
  246. ----
  247. include::{esql-specs}/dissect.csv-spec[tag=gs-dissect-stats-by]
  248. ----
  249. For more about data processing with {esql}, refer to
  250. <<esql-process-data-with-dissect-and-grok>>.
  251. [discrete]
  252. [[esql-getting-learn-more]]
  253. === Learn more
  254. To learn more about {esql}, refer to <<esql-language>> and <<esql-using>>.