Explorar o código

[DOCS] Getting started with ES|QL queries guide (#101634)

* Page structure

* More getting started content

* Fix build errors

* Small improvements

* Typo

* Add link to public demo environment

* Review feedback

* Update docs/reference/esql/esql-get-started.asciidoc

Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>

* Review feedback

---------

Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Abdon Pijpelink hai 1 ano
pai
achega
c6a1f831e0

+ 2 - 0
docs/reference/esql/esql-commands.asciidoc

@@ -5,6 +5,7 @@
 <titleabbrev>Commands</titleabbrev>
 ++++
 
+[[esql-source-commands]]
 // tag::source_commands[]
 ==== Source commands
 
@@ -20,6 +21,7 @@ image::images/esql/source-command.svg[A source command producing a table from {e
 
 // end::source_command[]
 
+[[esql-processing-commands]]
 // tag::proc_commands[]
 ==== Processing commands
 

+ 416 - 16
docs/reference/esql/esql-get-started.asciidoc

@@ -1,34 +1,434 @@
 [[esql-getting-started]]
-== Getting started with {esql}
+== Getting started with {esql} queries
 
 ++++
 <titleabbrev>Getting started</titleabbrev>
 ++++
 
-A simple example of an {esql} query is shown below:
-[source,esql]
+This guide shows how you can use {esql} to query and aggregate your data.
+
+TIP: To get started with {esql} without setting up your own deployment, visit
+the public {esql} demo environment at
+https://esql.demo.elastic.co/[esql.demo.elastic.co]. It comes with preloaded
+data sets and sample queries.
+
+[discrete]
+[[esql-getting-started-prerequisites]]
+=== Prerequisites
+
+To follow along with the queries in this getting started guide, first ingest
+some sample data using the following requests:
+
+[source,console]
 ----
-FROM employees
-| EVAL age = DATE_DIFF(NOW(), birth_date, 'Y')
-| STATS AVG(age) BY department
-| SORT age DESC
+PUT sample_data
+{
+  "mappings": {
+    "properties": {
+      "client.ip": {
+        "type": "ip"
+      },
+      "message": {
+        "type": "keyword"
+      }
+    }
+  }
+}
+
+PUT sample_data/_bulk
+{"index": {}}
+{"@timestamp": "2023-10-23T12:15:03.360Z", "client.ip": "172.21.2.162", "message": "Connected to 10.1.0.3", "event.duration": 3450233}
+{"index": {}}
+{"@timestamp": "2023-10-23T12:27:28.948Z", "client.ip": "172.21.2.113", "message": "Connected to 10.1.0.2", "event.duration": 2764889}
+{"index": {}}
+{"@timestamp": "2023-10-23T13:33:34.937Z", "client.ip": "172.21.0.5", "message": "Disconnected", "event.duration": 1232382}
+{"index": {}}
+{"@timestamp": "2023-10-23T13:51:54.732Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 725448}
+{"index": {}}
+{"@timestamp": "2023-10-23T13:52:55.015Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 8268153}
+{"index": {}}
+{"@timestamp": "2023-10-23T13:53:55.832Z", "client.ip": "172.21.3.15", "message": "Connection error", "event.duration": 5033755}
+{"index": {}}
+{"@timestamp": "2023-10-23T13:55:01.543Z", "client.ip": "172.21.3.15", "message": "Connected to 10.1.0.1", "event.duration": 1756467}
 ----
 
-Each {esql} query starts with a <<esql-commands,source command>>. A source command produces
-a table, typically with data from {es}.
+[discrete]
+[[esql-getting-started-running-queries]]
+=== Run an {esql} query
+
+In {kib}, you can use Console or Discover to run {esql} queries:
+
+include::{es-repo-dir}/tab-widgets/esql/esql-getting-started-widget.asciidoc[]
+
+[discrete]
+[[esql-getting-started-first-query]]
+=== Your first {esql} query
+
+Each {esql} query starts with a <<esql-source-commands,source command>>. A
+source command produces a table, typically with data from {es}.
 
 image::images/esql/source-command.svg[A source command producing a table from {es},align="center"]
 
+The <<esql-from>> source command returns a table with documents from a data
+stream, index, or alias. Each row in the resulting table represents a document.
+This query returns up to 500 documents from the `sample_data` index:
+
+[source,esql]
+----
+FROM sample_data
+----
+
+Each column corresponds to a field, and can be accessed by the name of that
+field.
+
+[TIP]
+====
+{esql} keywords are case-insensitive. The following query is identical to the
+previous one:
+
+[source,esql]
+----
+from sample_data
+----
+====
+
+[discrete]
+[[esql-getting-started-limit]]
+=== Processing commands
+
 A source command can be followed by one or more
-<<esql-commands,processing commands>>. Processing commands change an
-input table by adding, removing, or changing rows and columns.
-Processing commands can perform filtering, projection, aggregation, and more.
+<<esql-processing-commands,processing commands>>, separated by a pipe character:
+`|`. Processing commands change an input table by adding, removing, or changing
+rows and columns. Processing commands can perform filtering, projection,
+aggregation, and more.
+
+image::images/esql/esql-limit.png[A processing command changing an input table,align="center",width="60%"]
+
+For example, you can use the <<esql-limit>> command to limit the number of rows
+that are returned, up to a maximum of 10,000 rows:
+
+[source,esql]
+----
+FROM sample_data
+| LIMIT 3
+----
+
+[TIP]
+====
+For readability, you can put each command on a separate line. However, you don't
+have to. The following query is identical to the previous one:
+
+[source,esql]
+----
+FROM sample_data | LIMIT 3
+----
+====
+
+[discrete]
+[[esql-getting-started-sort]]
+==== Sort a table
+
+image::images/esql/esql-sort.png[A processing command sorting an input table,align="center",width="60%"]
+
+Another processing command is the <<esql-sort>> command. By default, the rows
+returned by `FROM` don't have a defined sort order. Use the `SORT` command to
+sort rows on one or more columns:
+
+[source,esql]
+----
+FROM sample_data
+| SORT @timestamp DESC
+----
+
+[discrete]
+[[esql-getting-started-where]]
+==== Query the data
+
+Use the <<esql-where>> command to query the data. For example, to find all
+events with a duration longer than 5ms:
 
-image::images/esql/processing-command.svg[A processing command changing an input table,align="center"]
+[source,esql]
+----
+FROM sample_data
+| WHERE event.duration > 5000000
+----
+
+`WHERE` supports several <<esql-operators,operators>>. For example, you can use <<esql-like-operator>> to run a wildcard query against the `message` column:
+
+[source,esql]
+----
+FROM sample_data
+| WHERE message LIKE "Connected*"
+----
+
+[discrete]
+[[esql-getting-started-more-commands]]
+==== More processing commands
+
+There are many other processing commands, like <<esql-keep>> and <<esql-drop>>
+to keep or drop columns, <<esql-enrich>> to enrich a table with data from
+indices in {es}, and <<esql-dissect>> and <<esql-grok>> to process data. Refer
+to <<esql-processing-commands>> for an overview of all processing commands.
+
+[discrete]
+[[esql-getting-started-chaining]]
+=== Chain processing commands
 
 You can chain processing commands, separated by a pipe character: `|`. Each
-processing command works on the output table of the previous command.
+processing command works on the output table of the previous command. The result
+of a query is the table produced by the final processing command.
+
+image::images/esql/esql-sort-limit.png[Processing commands can be chained,align="center"]
+
+The following example first sorts the table on `@timestamp`, and next limits the
+result set to 3 rows:
+
+[source,esql]
+----
+FROM sample_data
+| SORT @timestamp DESC
+| LIMIT 3
+----
+
+NOTE: The order of processing commands is important. First limiting the result
+set to 3 rows before sorting those 3 rows would most likely return a result that
+is different than this example, where the sorting comes before the limit.
+
+[discrete]
+[[esql-getting-started-eval]]
+=== Compute values
+
+Use the <<esql-eval>> command to append columns to a table, with calculated
+values. For example, the following query appends a `duration_ms` column. The
+values in the column are computed by dividing `event.duration` by 1,000,000. In
+other words: `event.duration` converted from nanoseconds to milliseconds.
+
+[source,esql]
+----
+FROM sample_data
+| EVAL duration_ms = event.duration / 1000000.0
+----
+
+`EVAL` supports several <<esql-functions,functions>>. For example, to round a
+number to the closest number with the specified number of digits, use the
+<<esql-round>> function:
+
+[source,esql]
+----
+FROM sample_data
+| EVAL duration_ms = ROUND(event.duration / 1000000.0, 1)
+----
+
+[discrete]
+[[esql-getting-started-stats]]
+=== Calculate statistics
+
+{esql} can not only be used to query your data, you can also use it to aggregate
+your data. Use the <<esql-stats-by>> command to calculate statistics. For
+example, the median duration:
+
+[source,esql]
+----
+FROM sample_data
+| STATS median_duration = MEDIAN(event.duration)
+----
+
+You can calculate multiple stats with one command:
+
+[source,esql]
+----
+FROM sample_data
+| STATS median_duration = MEDIAN(event.duration), max_duration = MAX(event.duration)
+----
+
+Use `BY` to group calculated stats by one or more columns. For example, to
+calculate the median duration per client IP:
+
+[source,esql]
+----
+FROM sample_data
+| STATS median_duration = MEDIAN(event.duration) BY client.ip
+----
+
+[discrete]
+[[esql-getting-started-histogram]]
+=== Create a histogram
+
+To track statistics over time, {esql} enables you to create histograms using the
+<<esql-auto_bucket>> function. `AUTO_BUCKET` creates human-friendly bucket sizes
+and returns a value for each row that corresponds to the resulting bucket the
+row falls into. 
+
+For example, to create hourly buckets for the data on October 23rd:
+
+[source,esql]
+----
+FROM sample_data
+| KEEP @timestamp
+| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
+----
+
+Combine `AUTO_BUCKET` with <<esql-stats-by>> to create a histogram. For example,
+to count the number of events per hour:
+
+[source,esql]
+----
+FROM sample_data
+| KEEP @timestamp, event.duration
+| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
+| STATS COUNT(*) BY bucket
+----
+
+Or the median duration per hour:
+
+[source,esql]
+----
+FROM sample_data
+| KEEP @timestamp, event.duration
+| EVAL bucket = AUTO_BUCKET (@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
+| STATS median_duration = MEDIAN(event.duration) BY bucket
+----
+
+[discrete]
+[[esql-getting-started-enrich]]
+=== Enrich data
+
+{esql} enables you to <<esql-enrich-data,enrich>> a table with data from indices
+in {es}, using the <<esql-enrich>> command.
+
+image::images/esql/esql-enrich.png[align="center"]
+
+Before you can use `ENRICH`, you first need to
+<<esql-create-enrich-policy,create>> and <<esql-execute-enrich-policy,execute>>
+an <<esql-enrich-policy,enrich policy>>. The following requests create and
+execute a policy that links an IP address to an environment ("Development",
+"QA", or "Production"):
+
+[source,console]
+----
+PUT clientips
+{
+  "mappings": {
+    "properties": {
+      "client.ip": {
+        "type": "keyword"
+      },
+      "env": {
+        "type": "keyword"
+      }
+    }
+  }
+}
+
+PUT clientips/_bulk
+{ "index" : {}}
+{ "client.ip": "172.21.0.5", "env": "Development" }
+{ "index" : {}}
+{ "client.ip": "172.21.2.113", "env": "QA" }
+{ "index" : {}}
+{ "client.ip": "172.21.2.162", "env": "QA" }
+{ "index" : {}}
+{ "client.ip": "172.21.3.15", "env": "Production" }
+{ "index" : {}}
+{ "client.ip": "172.21.3.16", "env": "Production" }
+
+PUT /_enrich/policy/clientip_policy
+{
+  "match": {
+    "indices": "clientips",
+    "match_field": "client.ip",
+    "enrich_fields": ["env"]
+  }
+}
+
+PUT /_enrich/policy/clientip_policy/_execute
+----
+
+////
+[source,console]
+----
+DELETE /_enrich/policy/clientip_policy
+----
+// TEST[continued]
+////
+
+After creating and executing a policy, you can use it with the `ENRICH`
+command:
+
+[source,esql]
+----
+FROM sample_data
+| KEEP @timestamp, client.ip, event.duration
+| EVAL client.ip = TO_STRING(client.ip)
+| ENRICH clientip_policy ON client.ip WITH env
+----
+
+You can use the new `env` column that's added by the `ENRICH` command in
+subsequent commands. For example, to calculate the median duration per
+environment:
+
+[source,esql]
+----
+FROM sample_data
+| KEEP @timestamp, client.ip, event.duration
+| EVAL client.ip = TO_STRING(client.ip)
+| ENRICH clientip_policy ON client.ip WITH env
+| STATS median_duration = MEDIAN(event.duration) BY env
+----
+
+For more about data enrichment with {esql}, refer to <<esql-enrich-data>>.
+
+[discrete]
+[[esql-getting-started-process-data]]
+=== Process data
+
+Your data may contain unstructured strings that you want to
+<<esql-process-data-with-dissect-and-grok,structure>> to make it easier to
+analyze the data. For example, the sample data contains log messages like:
+
+[source,txt]
+----
+"Connected to 10.1.0.3"
+----
+
+By extracting the IP address from these messages, you can determine which IP has
+accepted the most client connections.
+
+To structure unstructured strings at query time, you can use the {esql}
+<<esql-dissect>> and <<esql-grok>> commands. `DISSECT` works by breaking up a
+string using a delimiter-based pattern. `GROK` works similarly, but uses regular
+expressions. This makes `GROK` more powerful, but generally also slower.
+
+In this case, no regular expressions are needed, as the `message` is
+straightforward: "Connected to ", followed by the server IP. To match this
+string, you can use the following `DISSECT` command:
+
+[source,esql]
+----
+FROM sample_data
+| DISSECT message "Connected to %{server.ip}"
+----
+
+This adds a `server.ip` column to those rows that have a `message` that matches
+this pattern. For other rows, the value of `server.ip` is `null`.
+
+You can use the new `server.ip` column that's added by the `DISSECT` command in
+subsequent commands. For example, to determine how many connections each server
+has accepted:
+
+[source,esql]
+----
+FROM sample_data
+| WHERE STARTS_WITH(message, "Connected to")
+| DISSECT message "Connected to %{server.ip}"
+| STATS COUNT(*) BY server.ip
+----
+
+For more about data processing with {esql}, refer to
+<<esql-process-data-with-dissect-and-grok>>.
 
-image::images/esql/chaining-processing-commands.svg[Processing commands can be chained,align="center"]
+[discrete]
+[[esql-getting-learn-more]]
+=== Learn more
 
-The result of a query is the table produced by the final processing command.
+To learn more about {esql}, refer to <<esql-language>> and <<esql-using>>.

+ 6 - 0
docs/reference/esql/esql-kibana.asciidoc

@@ -17,10 +17,12 @@ data* on the *Sample web logs* card.
 [[esql-kibana-get-started]]
 === Get started with {esql}
 
+// tag::esql-mode[]
 To get started with {esql} in Discover, open the main menu and select
 *Discover*. Next, from the Data views menu, select *Try ES|QL*.
 
 image::images/esql/esql-data-view-menu.png[align="center",width=33%]
+// end::esql-mode[]
 
 The ability to select {esql} from the Data views menu can be enabled and
 disabled using the `discover:enableESQL` setting from
@@ -48,10 +50,12 @@ limits the number of rows that are retrieved.
 TIP: Click the help icon (image:images/esql/esql-icon-help.svg[]) to open the
 in-product reference documentation for all commands and functions.
 
+// tag::autocomplete[]
 To make it easier to write queries, auto-complete offers suggestions with
 possible commands and functions:
 
 image::images/esql/esql-kibana-auto-complete.png[align="center"]
+// end::autocomplete[]
 
 [NOTE]
 ====
@@ -76,6 +80,7 @@ FROM kibana_sample_data_logs
 | LIMIT 10
 ----
 
+// tag::compact[]
 To make it easier to write multi-line queries, click the double-headed arrow
 button (image:images/esql/esql-icon-expand-query-bar.svg[]) to expand the query
 bar:
@@ -84,6 +89,7 @@ image::images/esql/esql-expanded-query-bar.png[align="center"]
 
 To return to a compact query bar, click the minimize editor button
 (image:images/esql/esql-icon-minimize-query-bar.svg[]).
+// end::compact[]
 
 [discrete]
 ==== Warnings

+ 1 - 1
docs/reference/esql/esql-process-data-with-dissect-grok.asciidoc

@@ -21,7 +21,7 @@ use the {esql} <<esql-dissect>> and <<esql-grok>> commands.
 ==== `DISSECT` or `GROK`? Or both?
 
 `DISSECT` works by breaking up a string using a delimiter-based pattern. `GROK`
-works similarly, but uses regular expressions. This make `GROK` more powerful,
+works similarly, but uses regular expressions. This makes `GROK` more powerful,
 but generally also slower. `DISSECT` works well when data is reliably repeated.
 `GROK` is a better choice when you really need the power of regular expressions,
 for example when the structure of your text varies from row to row.

BIN=BIN
docs/reference/images/esql/esql-keep.png


BIN=BIN
docs/reference/images/esql/esql-limit.png


BIN=BIN
docs/reference/images/esql/esql-sort-limit.png


BIN=BIN
docs/reference/images/esql/esql-sort.png


+ 39 - 0
docs/reference/tab-widgets/esql/esql-getting-started-widget.asciidoc

@@ -0,0 +1,39 @@
+++++
+<div class="tabs" data-tab-group="model">
+  <div role="tablist" aria-label="model">
+    <button role="tab"
+            aria-selected="true"
+            aria-controls="esql-tab-console"
+            id="esql-console">
+      Console
+    </button>
+    <button role="tab"
+            aria-selected="false"
+            aria-controls="esql-tab-discover"
+            id="esql-disover">
+      Discover
+    </button>
+  </div>
+  <div tabindex="0"
+       role="tabpanel"
+       id="esql-tab-console"
+       aria-labelledby="esql-console">
+++++
+
+include::esql-getting-started.asciidoc[tag=console]
+
+++++
+  </div>
+  <div tabindex="0"
+       role="tabpanel"
+       id="esql-tab-discover"
+       aria-labelledby="esql-disover"
+       hidden="">
+++++
+
+include::esql-getting-started.asciidoc[tag=discover]
+
+++++
+  </div>
+</div>
+++++

+ 44 - 0
docs/reference/tab-widgets/esql/esql-getting-started.asciidoc

@@ -0,0 +1,44 @@
+// tag::console[]
+
+To get started with {esql} in Console, open the main menu and select
+*Dev Tools*.
+
+The general structure of an <<esql-query-api,{esql} query API>> request is:
+
+[source,txt]
+----
+POST /_query?format=txt
+{
+  "query": """
+
+  """
+}
+----
+
+Enter the actual {esql} query between the two sets of triple quotes. For
+example:
+
+[source,txt]
+----
+POST /_query?format=txt
+{
+  "query": """
+FROM sample_data
+  """
+}
+----
+// end::console[]
+
+
+// tag::discover[]
+
+include::../../esql/esql-kibana.asciidoc[tag=esql-mode]
+
+After switching to {esql} mode, the query bar shows a sample query. You can
+replace this query with the queries in this getting started guide.
+
+include::../../esql/esql-kibana.asciidoc[tag=autocomplete]
+
+include::../../esql/esql-kibana.asciidoc[tag=compact]
+
+// end::discover[]