|
@@ -1,34 +1,434 @@
|
|
|
[[esql-getting-started]]
|
|
|
-== Getting started with {esql}
|
|
|
+== Getting started with {esql} queries
|
|
|
|
|
|
++++
|
|
|
<titleabbrev>Getting started</titleabbrev>
|
|
|
++++
|
|
|
|
|
|
-A simple example of an {esql} query is shown below:
|
|
|
-[source,esql]
|
|
|
+This guide shows how you can use {esql} to query and aggregate your data.
|
|
|
+
|
|
|
+TIP: To get started with {esql} without setting up your own deployment, visit
|
|
|
+the public {esql} demo environment at
|
|
|
+https://esql.demo.elastic.co/[esql.demo.elastic.co]. It comes with preloaded
|
|
|
+data sets and sample queries.
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-prerequisites]]
|
|
|
+=== Prerequisites
|
|
|
+
|
|
|
+To follow along with the queries in this getting started guide, first ingest
|
|
|
+some sample data using the following requests:
|
|
|
+
|
|
|
+[source,console]
|
|
|
----
|
|
|
-FROM employees
|
|
|
-| EVAL age = DATE_DIFF(NOW(), birth_date, 'Y')
|
|
|
-| STATS AVG(age) BY department
|
|
|
-| SORT age DESC
|
|
|
+PUT sample_data
|
|
|
+{
|
|
|
+ "mappings": {
|
|
|
+ "properties": {
|
|
|
+ "client.ip": {
|
|
|
+ "type": "ip"
|
|
|
+ },
|
|
|
+ "message": {
|
|
|
+ "type": "keyword"
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+
|
|
|
+PUT sample_data/_bulk
|
|
|
+{"index": {}}
|
|
|
+{"@timestamp": "2023-10-23T12:15:03.360Z", "client.ip": "172.21.2.162", "message": "Connected to 10.1.0.3", "event.duration": 3450233}
|
|
|
+{"index": {}}
|
|
|
+{"@timestamp": "2023-10-23T12:27:28.948Z", "client.ip": "172.21.2.113", "message": "Connected to 10.1.0.2", "event.duration": 2764889}
|
|
|
+{"index": {}}
|
|
|
+{"@timestamp": "2023-10-23T13:33:34.937Z", "client.ip": "172.21.0.5", "message": "Disconnected", "event.duration": 1232382}
|
|
|
+{"index": {}}
|
|
|
+{"@timestamp": "2023-10-23T13:51:54.732Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 725448}
|
|
|
+{"index": {}}
|
|
|
+{"@timestamp": "2023-10-23T13:52:55.015Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 8268153}
|
|
|
+{"index": {}}
|
|
|
+{"@timestamp": "2023-10-23T13:53:55.832Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 5033755}
|
|
|
+{"index": {}}
|
|
|
+{"@timestamp": "2023-10-23T13:55:01.543Z", "client.ip": "172.21.3.15", "message": "Connected to 10.1.0.1", "event.duration": 1756467}
|
|
|
----
|
|
|
|
|
|
-Each {esql} query starts with a <<esql-commands,source command>>. A source command produces
|
|
|
-a table, typically with data from {es}.
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-running-queries]]
|
|
|
+=== Run an {esql} query
|
|
|
+
|
|
|
+In {kib}, you can use Console or Discover to run {esql} queries:
|
|
|
+
|
|
|
+include::{es-repo-dir}/tab-widgets/esql/esql-getting-started-widget.asciidoc[]
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-first-query]]
|
|
|
+=== Your first {esql} query
|
|
|
+
|
|
|
+Each {esql} query starts with a <<esql-source-commands,source command>>. A
|
|
|
+source command produces a table, typically with data from {es}.
|
|
|
|
|
|
image::images/esql/source-command.svg[A source command producing a table from {es},align="center"]
|
|
|
|
|
|
+The <<esql-from>> source command returns a table with documents from a data
|
|
|
+stream, index, or alias. Each row in the resulting table represents a document.
|
|
|
+This query returns up to 500 documents from the `sample_data` index:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+----
|
|
|
+
|
|
|
+Each column corresponds to a field, and can be accessed by the name of that
|
|
|
+field.
|
|
|
+
|
|
|
+[TIP]
|
|
|
+====
|
|
|
+{esql} keywords are case-insensitive. The following query is identical to the
|
|
|
+previous one:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+from sample_data
|
|
|
+----
|
|
|
+====
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-limit]]
|
|
|
+=== Processing commands
|
|
|
+
|
|
|
A source command can be followed by one or more
|
|
|
-<<esql-commands,processing commands>>. Processing commands change an
|
|
|
-input table by adding, removing, or changing rows and columns.
|
|
|
-Processing commands can perform filtering, projection, aggregation, and more.
|
|
|
+<<esql-processing-commands,processing commands>>, separated by a pipe character:
|
|
|
+`|`. Processing commands change an input table by adding, removing, or changing
|
|
|
+rows and columns. Processing commands can perform filtering, projection,
|
|
|
+aggregation, and more.
|
|
|
+
|
|
|
+image::images/esql/esql-limit.png[A processing command changing an input table,align="center",width="60%"]
|
|
|
+
|
|
|
+For example, you can use the <<esql-limit>> command to limit the number of rows
|
|
|
+that are returned, up to a maximum of 10,000 rows:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| LIMIT 3
|
|
|
+----
|
|
|
+
|
|
|
+[TIP]
|
|
|
+====
|
|
|
+For readability, you can put each command on a separate line. However, you don't
|
|
|
+have to. The following query is identical to the previous one:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data | LIMIT 3
|
|
|
+----
|
|
|
+====
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-sort]]
|
|
|
+==== Sort a table
|
|
|
+
|
|
|
+image::images/esql/esql-sort.png[A processing command sorting an input table,align="center",width="60%"]
|
|
|
+
|
|
|
+Another processing command is the <<esql-sort>> command. By default, the rows
|
|
|
+returned by `FROM` don't have a defined sort order. Use the `SORT` command to
|
|
|
+sort rows on one or more columns:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| SORT @timestamp DESC
|
|
|
+----
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-where]]
|
|
|
+==== Query the data
|
|
|
+
|
|
|
+Use the <<esql-where>> command to query the data. For example, to find all
|
|
|
+events with a duration longer than 5ms:
|
|
|
|
|
|
-image::images/esql/processing-command.svg[A processing command changing an input table,align="center"]
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| WHERE event.duration > 5000000
|
|
|
+----
|
|
|
+
|
|
|
+`WHERE` supports several <<esql-operators,operators>>. For example, you can use <<esql-like-operator>> to run a wildcard query against the `message` column:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| WHERE message LIKE "Connected*"
|
|
|
+----
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-more-commands]]
|
|
|
+==== More processing commands
|
|
|
+
|
|
|
+There are many other processing commands, like <<esql-keep>> and <<esql-drop>>
|
|
|
+to keep or drop columns, <<esql-enrich>> to enrich a table with data from
|
|
|
+indices in {es}, and <<esql-dissect>> and <<esql-grok>> to process data. Refer
|
|
|
+to <<esql-processing-commands>> for an overview of all processing commands.
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-chaining]]
|
|
|
+=== Chain processing commands
|
|
|
|
|
|
You can chain processing commands, separated by a pipe character: `|`. Each
|
|
|
-processing command works on the output table of the previous command.
|
|
|
+processing command works on the output table of the previous command. The result
|
|
|
+of a query is the table produced by the final processing command.
|
|
|
+
|
|
|
+image::images/esql/esql-sort-limit.png[Processing commands can be chained,align="center"]
|
|
|
+
|
|
|
+The following example first sorts the table on `@timestamp`, and next limits the
|
|
|
+result set to 3 rows:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| SORT @timestamp DESC
|
|
|
+| LIMIT 3
|
|
|
+----
|
|
|
+
|
|
|
+NOTE: The order of processing commands is important. First limiting the result
|
|
|
+set to 3 rows before sorting those 3 rows would most likely return a result that
|
|
|
+is different than this example, where the sorting comes before the limit.
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-eval]]
|
|
|
+=== Compute values
|
|
|
+
|
|
|
+Use the <<esql-eval>> command to append columns to a table, with calculated
|
|
|
+values. For example, the following query appends a `duration_ms` column. The
|
|
|
+values in the column are computed by dividing `event.duration` by 1,000,000. In
|
|
|
+other words: `event.duration` converted from nanoseconds to milliseconds.
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| EVAL duration_ms = event.duration / 1000000.0
|
|
|
+----
|
|
|
+
|
|
|
+`EVAL` supports several <<esql-functions,functions>>. For example, to round a
|
|
|
+number to the closest number with the specified number of digits, use the
|
|
|
+<<esql-round>> function:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| EVAL duration_ms = ROUND(event.duration / 1000000.0, 1)
|
|
|
+----
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-stats]]
|
|
|
+=== Calculate statistics
|
|
|
+
|
|
|
+{esql} can not only be used to query your data, you can also use it to aggregate
|
|
|
+your data. Use the <<esql-stats-by>> command to calculate statistics. For
|
|
|
+example, the median duration:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| STATS median_duration = MEDIAN(event.duration)
|
|
|
+----
|
|
|
+
|
|
|
+You can calculate multiple stats with one command:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| STATS median_duration = MEDIAN(event.duration), max_duration = MAX(event.duration)
|
|
|
+----
|
|
|
+
|
|
|
+Use `BY` to group calculated stats by one or more columns. For example, to
|
|
|
+calculate the median duration per client IP:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| STATS median_duration = MEDIAN(event.duration) BY client.ip
|
|
|
+----
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-histogram]]
|
|
|
+=== Create a histogram
|
|
|
+
|
|
|
+To track statistics over time, {esql} enables you to create histograms using the
|
|
|
+<<esql-auto_bucket>> function. `AUTO_BUCKET` creates human-friendly bucket sizes
|
|
|
+and returns a value for each row that corresponds to the resulting bucket the
|
|
|
+row falls into.
|
|
|
+
|
|
|
+For example, to create hourly buckets for the data on October 23rd:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| KEEP @timestamp
|
|
|
+| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
|
|
|
+----
|
|
|
+
|
|
|
+Combine `AUTO_BUCKET` with <<esql-stats-by>> to create a histogram. For example,
|
|
|
+to count the number of events per hour:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| KEEP @timestamp, event.duration
|
|
|
+| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
|
|
|
+| STATS COUNT(*) BY bucket
|
|
|
+----
|
|
|
+
|
|
|
+Or the median duration per hour:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| KEEP @timestamp, event.duration
|
|
|
+| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
|
|
|
+| STATS median_duration = MEDIAN(event.duration) BY bucket
|
|
|
+----
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-enrich]]
|
|
|
+=== Enrich data
|
|
|
+
|
|
|
+{esql} enables you to <<esql-enrich-data,enrich>> a table with data from indices
|
|
|
+in {es}, using the <<esql-enrich>> command.
|
|
|
+
|
|
|
+image::images/esql/esql-enrich.png[align="center"]
|
|
|
+
|
|
|
+Before you can use `ENRICH`, you first need to
|
|
|
+<<esql-create-enrich-policy,create>> and <<esql-execute-enrich-policy,execute>>
|
|
|
+an <<esql-enrich-policy,enrich policy>>. The following requests create and
|
|
|
+execute a policy that links an IP address to an environment ("Development",
|
|
|
+"QA", or "Production"):
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----
|
|
|
+PUT clientips
|
|
|
+{
|
|
|
+ "mappings": {
|
|
|
+ "properties": {
|
|
|
+ "client.ip": {
|
|
|
+ "type": "keyword"
|
|
|
+ },
|
|
|
+ "env": {
|
|
|
+ "type": "keyword"
|
|
|
+ }
|
|
|
+ }
|
|
|
+ }
|
|
|
+}
|
|
|
+
|
|
|
+PUT clientips/_bulk
|
|
|
+{ "index" : {}}
|
|
|
+{ "client.ip": "172.21.0.5", "env": "Development" }
|
|
|
+{ "index" : {}}
|
|
|
+{ "client.ip": "172.21.2.113", "env": "QA" }
|
|
|
+{ "index" : {}}
|
|
|
+{ "client.ip": "172.21.2.162", "env": "QA" }
|
|
|
+{ "index" : {}}
|
|
|
+{ "client.ip": "172.21.3.15", "env": "Production" }
|
|
|
+{ "index" : {}}
|
|
|
+{ "client.ip": "172.21.3.16", "env": "Production" }
|
|
|
+
|
|
|
+PUT /_enrich/policy/clientip_policy
|
|
|
+{
|
|
|
+ "match": {
|
|
|
+ "indices": "clientips",
|
|
|
+ "match_field": "client.ip",
|
|
|
+ "enrich_fields": ["env"]
|
|
|
+ }
|
|
|
+}
|
|
|
+
|
|
|
+PUT /_enrich/policy/clientip_policy/_execute
|
|
|
+----
|
|
|
+
|
|
|
+////
|
|
|
+[source,console]
|
|
|
+----
|
|
|
+DELETE /_enrich/policy/clientip_policy
|
|
|
+----
|
|
|
+// TEST[continued]
|
|
|
+////
|
|
|
+
|
|
|
+After creating and executing a policy, you can use it with the `ENRICH`
|
|
|
+command:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| KEEP @timestamp, client.ip, event.duration
|
|
|
+| EVAL client.ip = TO_STRING(client.ip)
|
|
|
+| ENRICH clientip_policy ON client.ip WITH env
|
|
|
+----
|
|
|
+
|
|
|
+You can use the new `env` column that's added by the `ENRICH` command in
|
|
|
+subsequent commands. For example, to calculate the median duration per
|
|
|
+environment:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| KEEP @timestamp, client.ip, event.duration
|
|
|
+| EVAL client.ip = TO_STRING(client.ip)
|
|
|
+| ENRICH clientip_policy ON client.ip WITH env
|
|
|
+| STATS median_duration = MEDIAN(event.duration) BY env
|
|
|
+----
|
|
|
+
|
|
|
+For more about data enrichment with {esql}, refer to <<esql-enrich-data>>.
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[esql-getting-started-process-data]]
|
|
|
+=== Process data
|
|
|
+
|
|
|
+Your data may contain unstructured strings that you want to
|
|
|
+<<esql-process-data-with-dissect-and-grok,structure>> to make it easier to
|
|
|
+analyze the data. For example, the sample data contains log messages like:
|
|
|
+
|
|
|
+[source,txt]
|
|
|
+----
|
|
|
+"Connected to 10.1.0.3"
|
|
|
+----
|
|
|
+
|
|
|
+By extracting the IP address from these messages, you can determine which IP has
|
|
|
+accepted the most client connections.
|
|
|
+
|
|
|
+To structure unstructured strings at query time, you can use the {esql}
|
|
|
+<<esql-dissect>> and <<esql-grok>> commands. `DISSECT` works by breaking up a
|
|
|
+string using a delimiter-based pattern. `GROK` works similarly, but uses regular
|
|
|
+expressions. This makes `GROK` more powerful, but generally also slower.
|
|
|
+
|
|
|
+In this case, no regular expressions are needed, as the `message` is
|
|
|
+straightforward: "Connected to ", followed by the server IP. To match this
|
|
|
+string, you can use the following `DISSECT` command:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| DISSECT message "Connected to %{server.ip}"
|
|
|
+----
|
|
|
+
|
|
|
+This adds a `server.ip` column to those rows that have a `message` that matches
|
|
|
+this pattern. For other rows, the value of `server.ip` is `null`.
|
|
|
+
|
|
|
+You can use the new `server.ip` column that's added by the `DISSECT` command in
|
|
|
+subsequent commands. For example, to determine how many connections each server
|
|
|
+has accepted:
|
|
|
+
|
|
|
+[source,esql]
|
|
|
+----
|
|
|
+FROM sample_data
|
|
|
+| WHERE STARTS_WITH(message, "Connected to")
|
|
|
+| DISSECT message "Connected to %{server.ip}"
|
|
|
+| STATS COUNT(*) BY server.ip
|
|
|
+----
|
|
|
+
|
|
|
+For more about data processing with {esql}, refer to
|
|
|
+<<esql-process-data-with-dissect-and-grok>>.
|
|
|
|
|
|
-image::images/esql/chaining-processing-commands.svg[Processing commands can be chained,align="center"]
|
|
|
+[discrete]
|
|
|
+[[esql-getting-learn-more]]
|
|
|
+=== Learn more
|
|
|
|
|
|
-The result of a query is the table produced by the final processing command.
|
|
|
+To learn more about {esql}, refer to <<esql-language>> and <<esql-using>>.
|