lqb
/
elasticsearch
tükrözi: https://gitee.com/mirrors/elasticsearch.git


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434
							[[esql-getting-started]]
== Getting started with {esql} queries

++++
<titleabbrev>Getting started</titleabbrev>
++++

This guide shows how you can use {esql} to query and aggregate your data.

TIP: To get started with {esql} without setting up your own deployment, visit
the public {esql} demo environment at
https://esql.demo.elastic.co/[esql.demo.elastic.co]. It comes with preloaded
data sets and sample queries.

[discrete]
[[esql-getting-started-prerequisites]]
=== Prerequisites

To follow along with the queries in this getting started guide, first ingest
some sample data using the following requests:

[source,console]
----
PUT sample_data
{
  "mappings": {
    "properties": {
      "client.ip": {
        "type": "ip"
      },
      "message": {
        "type": "keyword"
      }
    }
  }
}

PUT sample_data/_bulk
{"index": {}}
{"@timestamp": "2023-10-23T12:15:03.360Z", "client.ip": "172.21.2.162", "message": "Connected to 10.1.0.3", "event.duration": 3450233}
{"index": {}}
{"@timestamp": "2023-10-23T12:27:28.948Z", "client.ip": "172.21.2.113", "message": "Connected to 10.1.0.2", "event.duration": 2764889}
{"index": {}}
{"@timestamp": "2023-10-23T13:33:34.937Z", "client.ip": "172.21.0.5", "message": "Disconnected", "event.duration": 1232382}
{"index": {}}
{"@timestamp": "2023-10-23T13:51:54.732Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 725448}
{"index": {}}
{"@timestamp": "2023-10-23T13:52:55.015Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 8268153}
{"index": {}}
{"@timestamp": "2023-10-23T13:53:55.832Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 5033755}
{"index": {}}
{"@timestamp": "2023-10-23T13:55:01.543Z", "client.ip": "172.21.3.15", "message": "Connected to 10.1.0.1", "event.duration": 1756467}
----

[discrete]
[[esql-getting-started-running-queries]]
=== Run an {esql} query

In {kib}, you can use Console or Discover to run {esql} queries:

include::{es-repo-dir}/tab-widgets/esql/esql-getting-started-widget.asciidoc[]

[discrete]
[[esql-getting-started-first-query]]
=== Your first {esql} query

Each {esql} query starts with a <<esql-source-commands,source command>>. A
source command produces a table, typically with data from {es}.

image::images/esql/source-command.svg[A source command producing a table from {es},align="center"]

The <<esql-from>> source command returns a table with documents from a data
stream, index, or alias. Each row in the resulting table represents a document.
This query returns up to 500 documents from the `sample_data` index:

[source,esql]
----
FROM sample_data
----

Each column corresponds to a field, and can be accessed by the name of that
field.

[TIP]
====
{esql} keywords are case-insensitive. The following query is identical to the
previous one:

[source,esql]
----
from sample_data
----
====

[discrete]
[[esql-getting-started-limit]]
=== Processing commands

A source command can be followed by one or more
<<esql-processing-commands,processing commands>>, separated by a pipe character:
`|`. Processing commands change an input table by adding, removing, or changing
rows and columns. Processing commands can perform filtering, projection,
aggregation, and more.

image::images/esql/esql-limit.png[A processing command changing an input table,align="center",width="60%"]

For example, you can use the <<esql-limit>> command to limit the number of rows
that are returned, up to a maximum of 10,000 rows:

[source,esql]
----
FROM sample_data
| LIMIT 3
----

[TIP]
====
For readability, you can put each command on a separate line. However, you don't
have to. The following query is identical to the previous one:

[source,esql]
----
FROM sample_data | LIMIT 3
----
====

[discrete]
[[esql-getting-started-sort]]
==== Sort a table

image::images/esql/esql-sort.png[A processing command sorting an input table,align="center",width="60%"]

Another processing command is the <<esql-sort>> command. By default, the rows
returned by `FROM` don't have a defined sort order. Use the `SORT` command to
sort rows on one or more columns:

[source,esql]
----
FROM sample_data
| SORT @timestamp DESC
----

[discrete]
[[esql-getting-started-where]]
==== Query the data

Use the <<esql-where>> command to query the data. For example, to find all
events with a duration longer than 5ms:

[source,esql]
----
FROM sample_data
| WHERE event.duration > 5000000
----

`WHERE` supports several <<esql-operators,operators>>. For example, you can use <<esql-like-operator>> to run a wildcard query against the `message` column:

[source,esql]
----
FROM sample_data
| WHERE message LIKE "Connected*"
----

[discrete]
[[esql-getting-started-more-commands]]
==== More processing commands

There are many other processing commands, like <<esql-keep>> and <<esql-drop>>
to keep or drop columns, <<esql-enrich>> to enrich a table with data from
indices in {es}, and <<esql-dissect>> and <<esql-grok>> to process data. Refer
to <<esql-processing-commands>> for an overview of all processing commands.

[discrete]
[[esql-getting-started-chaining]]
=== Chain processing commands

You can chain processing commands, separated by a pipe character: `|`. Each
processing command works on the output table of the previous command. The result
of a query is the table produced by the final processing command.

image::images/esql/esql-sort-limit.png[Processing commands can be chained,align="center"]

The following example first sorts the table on `@timestamp`, and next limits the
result set to 3 rows:

[source,esql]
----
FROM sample_data
| SORT @timestamp DESC
| LIMIT 3
----

NOTE: The order of processing commands is important. First limiting the result
set to 3 rows before sorting those 3 rows would most likely return a result that
is different than this example, where the sorting comes before the limit.

[discrete]
[[esql-getting-started-eval]]
=== Compute values

Use the <<esql-eval>> command to append columns to a table, with calculated
values. For example, the following query appends a `duration_ms` column. The
values in the column are computed by dividing `event.duration` by 1,000,000. In
other words: `event.duration` converted from nanoseconds to milliseconds.

[source,esql]
----
FROM sample_data
| EVAL duration_ms = event.duration / 1000000.0
----

`EVAL` supports several <<esql-functions,functions>>. For example, to round a
number to the closest number with the specified number of digits, use the
<<esql-round>> function:

[source,esql]
----
FROM sample_data
| EVAL duration_ms = ROUND(event.duration / 1000000.0, 1)
----

[discrete]
[[esql-getting-started-stats]]
=== Calculate statistics

{esql} can not only be used to query your data, you can also use it to aggregate
your data. Use the <<esql-stats-by>> command to calculate statistics. For
example, the median duration:

[source,esql]
----
FROM sample_data
| STATS median_duration = MEDIAN(event.duration)
----

You can calculate multiple stats with one command:

[source,esql]
----
FROM sample_data
| STATS median_duration = MEDIAN(event.duration), max_duration = MAX(event.duration)
----

Use `BY` to group calculated stats by one or more columns. For example, to
calculate the median duration per client IP:

[source,esql]
----
FROM sample_data
| STATS median_duration = MEDIAN(event.duration) BY client.ip
----

[discrete]
[[esql-getting-started-histogram]]
=== Create a histogram

To track statistics over time, {esql} enables you to create histograms using the
<<esql-auto_bucket>> function. `AUTO_BUCKET` creates human-friendly bucket sizes
and returns a value for each row that corresponds to the resulting bucket the
row falls into. 

For example, to create hourly buckets for the data on October 23rd:

[source,esql]
----
FROM sample_data
| KEEP @timestamp
| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
----

Combine `AUTO_BUCKET` with <<esql-stats-by>> to create a histogram. For example,
to count the number of events per hour:

[source,esql]
----
FROM sample_data
| KEEP @timestamp, event.duration
| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
| STATS COUNT(*) BY bucket
----

Or the median duration per hour:

[source,esql]
----
FROM sample_data
| KEEP @timestamp, event.duration
| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
| STATS median_duration = MEDIAN(event.duration) BY bucket
----

[discrete]
[[esql-getting-started-enrich]]
=== Enrich data

{esql} enables you to <<esql-enrich-data,enrich>> a table with data from indices
in {es}, using the <<esql-enrich>> command.

image::images/esql/esql-enrich.png[align="center"]

Before you can use `ENRICH`, you first need to
<<esql-create-enrich-policy,create>> and <<esql-execute-enrich-policy,execute>>
an <<esql-enrich-policy,enrich policy>>. The following requests create and
execute a policy that links an IP address to an environment ("Development",
"QA", or "Production"):

[source,console]
----
PUT clientips
{
  "mappings": {
    "properties": {
      "client.ip": {
        "type": "keyword"
      },
      "env": {
        "type": "keyword"
      }
    }
  }
}

PUT clientips/_bulk
{ "index" : {}}
{ "client.ip": "172.21.0.5", "env": "Development" }
{ "index" : {}}
{ "client.ip": "172.21.2.113", "env": "QA" }
{ "index" : {}}
{ "client.ip": "172.21.2.162", "env": "QA" }
{ "index" : {}}
{ "client.ip": "172.21.3.15", "env": "Production" }
{ "index" : {}}
{ "client.ip": "172.21.3.16", "env": "Production" }

PUT /_enrich/policy/clientip_policy
{
  "match": {
    "indices": "clientips",
    "match_field": "client.ip",
    "enrich_fields": ["env"]
  }
}

PUT /_enrich/policy/clientip_policy/_execute
----

////
[source,console]
----
DELETE /_enrich/policy/clientip_policy
----
// TEST[continued]
////

After creating and executing a policy, you can use it with the `ENRICH`
command:

[source,esql]
----
FROM sample_data
| KEEP @timestamp, client.ip, event.duration
| EVAL client.ip = TO_STRING(client.ip)
| ENRICH clientip_policy ON client.ip WITH env
----

You can use the new `env` column that's added by the `ENRICH` command in
subsequent commands. For example, to calculate the median duration per
environment:

[source,esql]
----
FROM sample_data
| KEEP @timestamp, client.ip, event.duration
| EVAL client.ip = TO_STRING(client.ip)
| ENRICH clientip_policy ON client.ip WITH env
| STATS median_duration = MEDIAN(event.duration) BY env
----

For more about data enrichment with {esql}, refer to <<esql-enrich-data>>.

[discrete]
[[esql-getting-started-process-data]]
=== Process data

Your data may contain unstructured strings that you want to
<<esql-process-data-with-dissect-and-grok,structure>> to make it easier to
analyze the data. For example, the sample data contains log messages like:

[source,txt]
----
"Connected to 10.1.0.3"
----

By extracting the IP address from these messages, you can determine which IP has
accepted the most client connections.

To structure unstructured strings at query time, you can use the {esql}
<<esql-dissect>> and <<esql-grok>> commands. `DISSECT` works by breaking up a
string using a delimiter-based pattern. `GROK` works similarly, but uses regular
expressions. This makes `GROK` more powerful, but generally also slower.

In this case, no regular expressions are needed, as the `message` is
straightforward: "Connected to ", followed by the server IP. To match this
string, you can use the following `DISSECT` command:

[source,esql]
----
FROM sample_data
| DISSECT message "Connected to %{server.ip}"
----

This adds a `server.ip` column to those rows that have a `message` that matches
this pattern. For other rows, the value of `server.ip` is `null`.

You can use the new `server.ip` column that's added by the `DISSECT` command in
subsequent commands. For example, to determine how many connections each server
has accepted:

[source,esql]
----
FROM sample_data
| WHERE STARTS_WITH(message, "Connected to")
| DISSECT message "Connected to %{server.ip}"
| STATS COUNT(*) BY server.ip
----

For more about data processing with {esql}, refer to
<<esql-process-data-with-dissect-and-grok>>.

[discrete]
[[esql-getting-learn-more]]
=== Learn more

To learn more about {esql}, refer to <<esql-language>> and <<esql-using>>.