[[esql-getting-started]] == Getting started with {esql} queries ++++ Getting started ++++ This guide shows how you can use {esql} to query and aggregate your data. TIP: To get started with {esql} without setting up your own deployment, visit the public {esql} demo environment at https://esql.demo.elastic.co/[esql.demo.elastic.co]. It comes with preloaded data sets and sample queries. [discrete] [[esql-getting-started-prerequisites]] === Prerequisites To follow along with the queries in this getting started guide, first ingest some sample data using the following requests: [source,console] ---- PUT sample_data { "mappings": { "properties": { "client.ip": { "type": "ip" }, "message": { "type": "keyword" } } } } PUT sample_data/_bulk {"index": {}} {"@timestamp": "2023-10-23T12:15:03.360Z", "client.ip": "172.21.2.162", "message": "Connected to 10.1.0.3", "event.duration": 3450233} {"index": {}} {"@timestamp": "2023-10-23T12:27:28.948Z", "client.ip": "172.21.2.113", "message": "Connected to 10.1.0.2", "event.duration": 2764889} {"index": {}} {"@timestamp": "2023-10-23T13:33:34.937Z", "client.ip": "172.21.0.5", "message": "Disconnected", "event.duration": 1232382} {"index": {}} {"@timestamp": "2023-10-23T13:51:54.732Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 725448} {"index": {}} {"@timestamp": "2023-10-23T13:52:55.015Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 8268153} {"index": {}} {"@timestamp": "2023-10-23T13:53:55.832Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 5033755} {"index": {}} {"@timestamp": "2023-10-23T13:55:01.543Z", "client.ip": "172.21.3.15", "message": "Connected to 10.1.0.1", "event.duration": 1756467} ---- [discrete] [[esql-getting-started-running-queries]] === Run an {esql} query In {kib}, you can use Console or Discover to run {esql} queries: include::{es-repo-dir}/tab-widgets/esql/esql-getting-started-widget.asciidoc[] [discrete] [[esql-getting-started-first-query]] === Your first {esql} query Each {esql} query starts with a <>. A source command produces a table, typically with data from {es}. image::images/esql/source-command.svg[A source command producing a table from {es},align="center"] The <> source command returns a table with documents from a data stream, index, or alias. Each row in the resulting table represents a document. This query returns up to 500 documents from the `sample_data` index: [source,esql] ---- FROM sample_data ---- Each column corresponds to a field, and can be accessed by the name of that field. [TIP] ==== {esql} keywords are case-insensitive. The following query is identical to the previous one: [source,esql] ---- from sample_data ---- ==== [discrete] [[esql-getting-started-limit]] === Processing commands A source command can be followed by one or more <>, separated by a pipe character: `|`. Processing commands change an input table by adding, removing, or changing rows and columns. Processing commands can perform filtering, projection, aggregation, and more. image::images/esql/esql-limit.png[A processing command changing an input table,align="center",width="60%"] For example, you can use the <> command to limit the number of rows that are returned, up to a maximum of 10,000 rows: [source,esql] ---- FROM sample_data | LIMIT 3 ---- [TIP] ==== For readability, you can put each command on a separate line. However, you don't have to. The following query is identical to the previous one: [source,esql] ---- FROM sample_data | LIMIT 3 ---- ==== [discrete] [[esql-getting-started-sort]] ==== Sort a table image::images/esql/esql-sort.png[A processing command sorting an input table,align="center",width="60%"] Another processing command is the <> command. By default, the rows returned by `FROM` don't have a defined sort order. Use the `SORT` command to sort rows on one or more columns: [source,esql] ---- FROM sample_data | SORT @timestamp DESC ---- [discrete] [[esql-getting-started-where]] ==== Query the data Use the <> command to query the data. For example, to find all events with a duration longer than 5ms: [source,esql] ---- FROM sample_data | WHERE event.duration > 5000000 ---- `WHERE` supports several <>. For example, you can use <> to run a wildcard query against the `message` column: [source,esql] ---- FROM sample_data | WHERE message LIKE "Connected*" ---- [discrete] [[esql-getting-started-more-commands]] ==== More processing commands There are many other processing commands, like <> and <> to keep or drop columns, <> to enrich a table with data from indices in {es}, and <> and <> to process data. Refer to <> for an overview of all processing commands. [discrete] [[esql-getting-started-chaining]] === Chain processing commands You can chain processing commands, separated by a pipe character: `|`. Each processing command works on the output table of the previous command. The result of a query is the table produced by the final processing command. image::images/esql/esql-sort-limit.png[Processing commands can be chained,align="center"] The following example first sorts the table on `@timestamp`, and next limits the result set to 3 rows: [source,esql] ---- FROM sample_data | SORT @timestamp DESC | LIMIT 3 ---- NOTE: The order of processing commands is important. First limiting the result set to 3 rows before sorting those 3 rows would most likely return a result that is different than this example, where the sorting comes before the limit. [discrete] [[esql-getting-started-eval]] === Compute values Use the <> command to append columns to a table, with calculated values. For example, the following query appends a `duration_ms` column. The values in the column are computed by dividing `event.duration` by 1,000,000. In other words: `event.duration` converted from nanoseconds to milliseconds. [source,esql] ---- FROM sample_data | EVAL duration_ms = event.duration / 1000000.0 ---- `EVAL` supports several <>. For example, to round a number to the closest number with the specified number of digits, use the <> function: [source,esql] ---- FROM sample_data | EVAL duration_ms = ROUND(event.duration / 1000000.0, 1) ---- [discrete] [[esql-getting-started-stats]] === Calculate statistics {esql} can not only be used to query your data, you can also use it to aggregate your data. Use the <> command to calculate statistics. For example, the median duration: [source,esql] ---- FROM sample_data | STATS median_duration = MEDIAN(event.duration) ---- You can calculate multiple stats with one command: [source,esql] ---- FROM sample_data | STATS median_duration = MEDIAN(event.duration), max_duration = MAX(event.duration) ---- Use `BY` to group calculated stats by one or more columns. For example, to calculate the median duration per client IP: [source,esql] ---- FROM sample_data | STATS median_duration = MEDIAN(event.duration) BY client.ip ---- [discrete] [[esql-getting-started-histogram]] === Create a histogram To track statistics over time, {esql} enables you to create histograms using the <> function. `AUTO_BUCKET` creates human-friendly bucket sizes and returns a value for each row that corresponds to the resulting bucket the row falls into. For example, to create hourly buckets for the data on October 23rd: [source,esql] ---- FROM sample_data | KEEP @timestamp | EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z") ---- Combine `AUTO_BUCKET` with <> to create a histogram. For example, to count the number of events per hour: [source,esql] ---- FROM sample_data | KEEP @timestamp, event.duration | EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z") | STATS COUNT(*) BY bucket ---- Or the median duration per hour: [source,esql] ---- FROM sample_data | KEEP @timestamp, event.duration | EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z") | STATS median_duration = MEDIAN(event.duration) BY bucket ---- [discrete] [[esql-getting-started-enrich]] === Enrich data {esql} enables you to <> a table with data from indices in {es}, using the <> command. image::images/esql/esql-enrich.png[align="center"] Before you can use `ENRICH`, you first need to <> and <> an <>. The following requests create and execute a policy that links an IP address to an environment ("Development", "QA", or "Production"): [source,console] ---- PUT clientips { "mappings": { "properties": { "client.ip": { "type": "keyword" }, "env": { "type": "keyword" } } } } PUT clientips/_bulk { "index" : {}} { "client.ip": "172.21.0.5", "env": "Development" } { "index" : {}} { "client.ip": "172.21.2.113", "env": "QA" } { "index" : {}} { "client.ip": "172.21.2.162", "env": "QA" } { "index" : {}} { "client.ip": "172.21.3.15", "env": "Production" } { "index" : {}} { "client.ip": "172.21.3.16", "env": "Production" } PUT /_enrich/policy/clientip_policy { "match": { "indices": "clientips", "match_field": "client.ip", "enrich_fields": ["env"] } } PUT /_enrich/policy/clientip_policy/_execute ---- //// [source,console] ---- DELETE /_enrich/policy/clientip_policy ---- // TEST[continued] //// After creating and executing a policy, you can use it with the `ENRICH` command: [source,esql] ---- FROM sample_data | KEEP @timestamp, client.ip, event.duration | EVAL client.ip = TO_STRING(client.ip) | ENRICH clientip_policy ON client.ip WITH env ---- You can use the new `env` column that's added by the `ENRICH` command in subsequent commands. For example, to calculate the median duration per environment: [source,esql] ---- FROM sample_data | KEEP @timestamp, client.ip, event.duration | EVAL client.ip = TO_STRING(client.ip) | ENRICH clientip_policy ON client.ip WITH env | STATS median_duration = MEDIAN(event.duration) BY env ---- For more about data enrichment with {esql}, refer to <>. [discrete] [[esql-getting-started-process-data]] === Process data Your data may contain unstructured strings that you want to <> to make it easier to analyze the data. For example, the sample data contains log messages like: [source,txt] ---- "Connected to 10.1.0.3" ---- By extracting the IP address from these messages, you can determine which IP has accepted the most client connections. To structure unstructured strings at query time, you can use the {esql} <> and <> commands. `DISSECT` works by breaking up a string using a delimiter-based pattern. `GROK` works similarly, but uses regular expressions. This makes `GROK` more powerful, but generally also slower. In this case, no regular expressions are needed, as the `message` is straightforward: "Connected to ", followed by the server IP. To match this string, you can use the following `DISSECT` command: [source,esql] ---- FROM sample_data | DISSECT message "Connected to %{server.ip}" ---- This adds a `server.ip` column to those rows that have a `message` that matches this pattern. For other rows, the value of `server.ip` is `null`. You can use the new `server.ip` column that's added by the `DISSECT` command in subsequent commands. For example, to determine how many connections each server has accepted: [source,esql] ---- FROM sample_data | WHERE STARTS_WITH(message, "Connected to") | DISSECT message "Connected to %{server.ip}" | STATS COUNT(*) BY server.ip ---- For more about data processing with {esql}, refer to <>. [discrete] [[esql-getting-learn-more]] === Learn more To learn more about {esql}, refer to <> and <>.