esql-get-started.asciidoc 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434
  1. [[esql-getting-started]]
  2. == Getting started with {esql} queries
  3. ++++
  4. <titleabbrev>Getting started</titleabbrev>
  5. ++++
  6. This guide shows how you can use {esql} to query and aggregate your data.
  7. TIP: To get started with {esql} without setting up your own deployment, visit
  8. the public {esql} demo environment at
  9. https://esql.demo.elastic.co/[esql.demo.elastic.co]. It comes with preloaded
  10. data sets and sample queries.
  11. [discrete]
  12. [[esql-getting-started-prerequisites]]
  13. === Prerequisites
  14. To follow along with the queries in this getting started guide, first ingest
  15. some sample data using the following requests:
  16. [source,console]
  17. ----
  18. PUT sample_data
  19. {
  20. "mappings": {
  21. "properties": {
  22. "client.ip": {
  23. "type": "ip"
  24. },
  25. "message": {
  26. "type": "keyword"
  27. }
  28. }
  29. }
  30. }
  31. PUT sample_data/_bulk
  32. {"index": {}}
  33. {"@timestamp": "2023-10-23T12:15:03.360Z", "client.ip": "172.21.2.162", "message": "Connected to 10.1.0.3", "event.duration": 3450233}
  34. {"index": {}}
  35. {"@timestamp": "2023-10-23T12:27:28.948Z", "client.ip": "172.21.2.113", "message": "Connected to 10.1.0.2", "event.duration": 2764889}
  36. {"index": {}}
  37. {"@timestamp": "2023-10-23T13:33:34.937Z", "client.ip": "172.21.0.5", "message": "Disconnected", "event.duration": 1232382}
  38. {"index": {}}
  39. {"@timestamp": "2023-10-23T13:51:54.732Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 725448}
  40. {"index": {}}
  41. {"@timestamp": "2023-10-23T13:52:55.015Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 8268153}
  42. {"index": {}}
  43. {"@timestamp": "2023-10-23T13:53:55.832Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 5033755}
  44. {"index": {}}
  45. {"@timestamp": "2023-10-23T13:55:01.543Z", "client.ip": "172.21.3.15", "message": "Connected to 10.1.0.1", "event.duration": 1756467}
  46. ----
  47. [discrete]
  48. [[esql-getting-started-running-queries]]
  49. === Run an {esql} query
  50. In {kib}, you can use Console or Discover to run {esql} queries:
  51. include::{es-repo-dir}/tab-widgets/esql/esql-getting-started-widget.asciidoc[]
  52. [discrete]
  53. [[esql-getting-started-first-query]]
  54. === Your first {esql} query
  55. Each {esql} query starts with a <<esql-source-commands,source command>>. A
  56. source command produces a table, typically with data from {es}.
  57. image::images/esql/source-command.svg[A source command producing a table from {es},align="center"]
  58. The <<esql-from>> source command returns a table with documents from a data
  59. stream, index, or alias. Each row in the resulting table represents a document.
  60. This query returns up to 500 documents from the `sample_data` index:
  61. [source,esql]
  62. ----
  63. FROM sample_data
  64. ----
  65. Each column corresponds to a field, and can be accessed by the name of that
  66. field.
  67. [TIP]
  68. ====
  69. {esql} keywords are case-insensitive. The following query is identical to the
  70. previous one:
  71. [source,esql]
  72. ----
  73. from sample_data
  74. ----
  75. ====
  76. [discrete]
  77. [[esql-getting-started-limit]]
  78. === Processing commands
  79. A source command can be followed by one or more
  80. <<esql-processing-commands,processing commands>>, separated by a pipe character:
  81. `|`. Processing commands change an input table by adding, removing, or changing
  82. rows and columns. Processing commands can perform filtering, projection,
  83. aggregation, and more.
  84. image::images/esql/esql-limit.png[A processing command changing an input table,align="center",width="60%"]
  85. For example, you can use the <<esql-limit>> command to limit the number of rows
  86. that are returned, up to a maximum of 10,000 rows:
  87. [source,esql]
  88. ----
  89. FROM sample_data
  90. | LIMIT 3
  91. ----
  92. [TIP]
  93. ====
  94. For readability, you can put each command on a separate line. However, you don't
  95. have to. The following query is identical to the previous one:
  96. [source,esql]
  97. ----
  98. FROM sample_data | LIMIT 3
  99. ----
  100. ====
  101. [discrete]
  102. [[esql-getting-started-sort]]
  103. ==== Sort a table
  104. image::images/esql/esql-sort.png[A processing command sorting an input table,align="center",width="60%"]
  105. Another processing command is the <<esql-sort>> command. By default, the rows
  106. returned by `FROM` don't have a defined sort order. Use the `SORT` command to
  107. sort rows on one or more columns:
  108. [source,esql]
  109. ----
  110. FROM sample_data
  111. | SORT @timestamp DESC
  112. ----
  113. [discrete]
  114. [[esql-getting-started-where]]
  115. ==== Query the data
  116. Use the <<esql-where>> command to query the data. For example, to find all
  117. events with a duration longer than 5ms:
  118. [source,esql]
  119. ----
  120. FROM sample_data
  121. | WHERE event.duration > 5000000
  122. ----
  123. `WHERE` supports several <<esql-operators,operators>>. For example, you can use <<esql-like-operator>> to run a wildcard query against the `message` column:
  124. [source,esql]
  125. ----
  126. FROM sample_data
  127. | WHERE message LIKE "Connected*"
  128. ----
  129. [discrete]
  130. [[esql-getting-started-more-commands]]
  131. ==== More processing commands
  132. There are many other processing commands, like <<esql-keep>> and <<esql-drop>>
  133. to keep or drop columns, <<esql-enrich>> to enrich a table with data from
  134. indices in {es}, and <<esql-dissect>> and <<esql-grok>> to process data. Refer
  135. to <<esql-processing-commands>> for an overview of all processing commands.
  136. [discrete]
  137. [[esql-getting-started-chaining]]
  138. === Chain processing commands
  139. You can chain processing commands, separated by a pipe character: `|`. Each
  140. processing command works on the output table of the previous command. The result
  141. of a query is the table produced by the final processing command.
  142. image::images/esql/esql-sort-limit.png[Processing commands can be chained,align="center"]
  143. The following example first sorts the table on `@timestamp`, and next limits the
  144. result set to 3 rows:
  145. [source,esql]
  146. ----
  147. FROM sample_data
  148. | SORT @timestamp DESC
  149. | LIMIT 3
  150. ----
  151. NOTE: The order of processing commands is important. First limiting the result
  152. set to 3 rows before sorting those 3 rows would most likely return a result that
  153. is different than this example, where the sorting comes before the limit.
  154. [discrete]
  155. [[esql-getting-started-eval]]
  156. === Compute values
  157. Use the <<esql-eval>> command to append columns to a table, with calculated
  158. values. For example, the following query appends a `duration_ms` column. The
  159. values in the column are computed by dividing `event.duration` by 1,000,000. In
  160. other words: `event.duration` converted from nanoseconds to milliseconds.
  161. [source,esql]
  162. ----
  163. FROM sample_data
  164. | EVAL duration_ms = event.duration / 1000000.0
  165. ----
  166. `EVAL` supports several <<esql-functions,functions>>. For example, to round a
  167. number to the closest number with the specified number of digits, use the
  168. <<esql-round>> function:
  169. [source,esql]
  170. ----
  171. FROM sample_data
  172. | EVAL duration_ms = ROUND(event.duration / 1000000.0, 1)
  173. ----
  174. [discrete]
  175. [[esql-getting-started-stats]]
  176. === Calculate statistics
  177. {esql} can not only be used to query your data, you can also use it to aggregate
  178. your data. Use the <<esql-stats-by>> command to calculate statistics. For
  179. example, the median duration:
  180. [source,esql]
  181. ----
  182. FROM sample_data
  183. | STATS median_duration = MEDIAN(event.duration)
  184. ----
  185. You can calculate multiple stats with one command:
  186. [source,esql]
  187. ----
  188. FROM sample_data
  189. | STATS median_duration = MEDIAN(event.duration), max_duration = MAX(event.duration)
  190. ----
  191. Use `BY` to group calculated stats by one or more columns. For example, to
  192. calculate the median duration per client IP:
  193. [source,esql]
  194. ----
  195. FROM sample_data
  196. | STATS median_duration = MEDIAN(event.duration) BY client.ip
  197. ----
  198. [discrete]
  199. [[esql-getting-started-histogram]]
  200. === Create a histogram
  201. To track statistics over time, {esql} enables you to create histograms using the
  202. <<esql-auto_bucket>> function. `AUTO_BUCKET` creates human-friendly bucket sizes
  203. and returns a value for each row that corresponds to the resulting bucket the
  204. row falls into.
  205. For example, to create hourly buckets for the data on October 23rd:
  206. [source,esql]
  207. ----
  208. FROM sample_data
  209. | KEEP @timestamp
  210. | EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
  211. ----
  212. Combine `AUTO_BUCKET` with <<esql-stats-by>> to create a histogram. For example,
  213. to count the number of events per hour:
  214. [source,esql]
  215. ----
  216. FROM sample_data
  217. | KEEP @timestamp, event.duration
  218. | EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
  219. | STATS COUNT(*) BY bucket
  220. ----
  221. Or the median duration per hour:
  222. [source,esql]
  223. ----
  224. FROM sample_data
  225. | KEEP @timestamp, event.duration
  226. | EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
  227. | STATS median_duration = MEDIAN(event.duration) BY bucket
  228. ----
  229. [discrete]
  230. [[esql-getting-started-enrich]]
  231. === Enrich data
  232. {esql} enables you to <<esql-enrich-data,enrich>> a table with data from indices
  233. in {es}, using the <<esql-enrich>> command.
  234. image::images/esql/esql-enrich.png[align="center"]
  235. Before you can use `ENRICH`, you first need to
  236. <<esql-create-enrich-policy,create>> and <<esql-execute-enrich-policy,execute>>
  237. an <<esql-enrich-policy,enrich policy>>. The following requests create and
  238. execute a policy that links an IP address to an environment ("Development",
  239. "QA", or "Production"):
  240. [source,console]
  241. ----
  242. PUT clientips
  243. {
  244. "mappings": {
  245. "properties": {
  246. "client.ip": {
  247. "type": "keyword"
  248. },
  249. "env": {
  250. "type": "keyword"
  251. }
  252. }
  253. }
  254. }
  255. PUT clientips/_bulk
  256. { "index" : {}}
  257. { "client.ip": "172.21.0.5", "env": "Development" }
  258. { "index" : {}}
  259. { "client.ip": "172.21.2.113", "env": "QA" }
  260. { "index" : {}}
  261. { "client.ip": "172.21.2.162", "env": "QA" }
  262. { "index" : {}}
  263. { "client.ip": "172.21.3.15", "env": "Production" }
  264. { "index" : {}}
  265. { "client.ip": "172.21.3.16", "env": "Production" }
  266. PUT /_enrich/policy/clientip_policy
  267. {
  268. "match": {
  269. "indices": "clientips",
  270. "match_field": "client.ip",
  271. "enrich_fields": ["env"]
  272. }
  273. }
  274. PUT /_enrich/policy/clientip_policy/_execute
  275. ----
  276. ////
  277. [source,console]
  278. ----
  279. DELETE /_enrich/policy/clientip_policy
  280. ----
  281. // TEST[continued]
  282. ////
  283. After creating and executing a policy, you can use it with the `ENRICH`
  284. command:
  285. [source,esql]
  286. ----
  287. FROM sample_data
  288. | KEEP @timestamp, client.ip, event.duration
  289. | EVAL client.ip = TO_STRING(client.ip)
  290. | ENRICH clientip_policy ON client.ip WITH env
  291. ----
  292. You can use the new `env` column that's added by the `ENRICH` command in
  293. subsequent commands. For example, to calculate the median duration per
  294. environment:
  295. [source,esql]
  296. ----
  297. FROM sample_data
  298. | KEEP @timestamp, client.ip, event.duration
  299. | EVAL client.ip = TO_STRING(client.ip)
  300. | ENRICH clientip_policy ON client.ip WITH env
  301. | STATS median_duration = MEDIAN(event.duration) BY env
  302. ----
  303. For more about data enrichment with {esql}, refer to <<esql-enrich-data>>.
  304. [discrete]
  305. [[esql-getting-started-process-data]]
  306. === Process data
  307. Your data may contain unstructured strings that you want to
  308. <<esql-process-data-with-dissect-and-grok,structure>> to make it easier to
  309. analyze the data. For example, the sample data contains log messages like:
  310. [source,txt]
  311. ----
  312. "Connected to 10.1.0.3"
  313. ----
  314. By extracting the IP address from these messages, you can determine which IP has
  315. accepted the most client connections.
  316. To structure unstructured strings at query time, you can use the {esql}
  317. <<esql-dissect>> and <<esql-grok>> commands. `DISSECT` works by breaking up a
  318. string using a delimiter-based pattern. `GROK` works similarly, but uses regular
  319. expressions. This makes `GROK` more powerful, but generally also slower.
  320. In this case, no regular expressions are needed, as the `message` is
  321. straightforward: "Connected to ", followed by the server IP. To match this
  322. string, you can use the following `DISSECT` command:
  323. [source,esql]
  324. ----
  325. FROM sample_data
  326. | DISSECT message "Connected to %{server.ip}"
  327. ----
  328. This adds a `server.ip` column to those rows that have a `message` that matches
  329. this pattern. For other rows, the value of `server.ip` is `null`.
  330. You can use the new `server.ip` column that's added by the `DISSECT` command in
  331. subsequent commands. For example, to determine how many connections each server
  332. has accepted:
  333. [source,esql]
  334. ----
  335. FROM sample_data
  336. | WHERE STARTS_WITH(message, "Connected to")
  337. | DISSECT message "Connected to %{server.ip}"
  338. | STATS COUNT(*) BY server.ip
  339. ----
  340. For more about data processing with {esql}, refer to
  341. <<esql-process-data-with-dissect-and-grok>>.
  342. [discrete]
  343. [[esql-getting-learn-more]]
  344. === Learn more
  345. To learn more about {esql}, refer to <<esql-language>> and <<esql-using>>.