Prechádzať zdrojové kódy

[DOCS] EQL: Document async search submits (#56704)

James Rodewig 5 rokov pred
rodič
commit
aee5618001

+ 365 - 14
docs/reference/eql/eql-search-api.asciidoc

@@ -81,6 +81,72 @@ Defaults to `open`.
 
 include::{docdir}/rest-api/common-parms.asciidoc[tag=index-ignore-unavailable]
 
+`keep_alive`::
++
+--
+(Optional, <<time-units,time value>>)
+Period for which the search and its results are stored on the cluster. Defaults
+to `5d` (five days).
+
+When this period expires, the search and its results are deleted, even if the
+search is still ongoing.
+
+If the <<eql-search-api-keep-on-completion,`keep_on_completion`>> parameter is
+`false`, {es} only stores <<<eql-search-async,async searches>> that do not
+complete within the period set by the
+<<eql-search-api-wait-for-completion-timeout,`wait_for_completion_timeout`>>
+parameter, regardless of this value.
+
+[IMPORTANT]
+====
+You can also specify this value using the `keep_alive` request body parameter.
+If both parameters are specified, only the query parameter is used.
+====
+--
+
+`keep_on_completion`::
++
+--
+(Optional, boolean)
+If `true`, the search and its results are stored on the cluster.
+
+If `false`, the search and its results are stored on the cluster only if the
+request does not complete during the period set by the
+<<eql-search-api-wait-for-completion-timeout,`wait_for_completion_timeout`>>
+parameter. Defaults to `false`.
+
+[IMPORTANT]
+====
+You can also specify this value using the `keep_on_completion` request body
+parameter. If both parameters are specified, only the query parameter is used.
+====
+--
+
+`wait_for_completion_timeout`::
++
+--
+(Optional, <<time-units,time value>>)
+Timeout duration to block other requests while waiting for the search request to
+finish. Defaults to no timeout, meaning the search request waits for complete
+search results.
+
+If this parameter is specified and the request completes before this period
+ends, complete search results are returned.
+
+If the request does not complete before this period ends, the search becomes an
+<<eql-search-async,async search>>. A
+<<eql-search-api-response-body-search-id,search ID>> is returned at the end of
+the period. This ID can be used to monitor the progress of the async search as
+it runs.
+
+[IMPORTANT]
+====
+You can also specify this value using the `wait_for_completion_timeout` request
+body parameter. If both parameters are specified, only the query parameter is
+used.
+====
+--
+
 [[eql-search-api-request-body]]
 ==== {api-request-body-title}
 
@@ -102,6 +168,48 @@ runs.
 (Optional, string)
 Reserved for future use.
 
+`keep_alive`::
++
+--
+(Optional, <<time-units,time value>>)
+Period for which the search and its results are stored on the cluster. Defaults
+to `5d` (five days).
+
+When this period expires, the search and its results are deleted, even if the
+search is still ongoing.
+
+If the <<eql-search-api-keep-on-completion,`keep_on_completion`>> parameter is
+`false`, {es} only stores <<<eql-search-async,async searches>> that do not
+complete within the period set by the
+<<eql-search-api-wait-for-completion-timeout,`wait_for_completion_timeout`>>
+parameter, regardless of this value.
+
+[IMPORTANT]
+====
+You can also specify this value using the `keep_alive` query parameter.
+If both parameters are specified, only the query parameter is used.
+====
+--
+
+[[eql-search-api-keep-on-completion]]
+`keep_on_completion`::
++
+--
+(Optional, boolean)
+If `true`, the search and its results are stored on the cluster.
+
+If `false`, the search and its results are stored on the cluster only if the
+request does not complete during the period set by the
+<<eql-search-api-wait-for-completion-timeout,`wait_for_completion_timeout`>>
+parameter. Defaults to `false`.
+
+[IMPORTANT]
+====
+You can also specify this value using the `keep_on_completion` query parameter.
+If both parameters are specified, only the query parameter is used.
+====
+--
+
 `query`::
 (Required, string)
 <<eql-syntax,EQL>> query you wish to run.
@@ -134,10 +242,73 @@ milliseconds since the https://en.wikipedia.org/wiki/Unix_time[Unix epoch], in
 ascending order.
 --
 
+[[eql-search-api-wait-for-completion-timeout]]
+`wait_for_completion_timeout`::
++
+--
+(Optional, <<time-units,time value>>)
+Timeout duration to block other requests while waiting for the search request to
+finish. Defaults to no timeout, meaning the search request waits for complete
+search results.
+
+If this parameter is specified and the request completes before this period
+ends, complete search results are returned.
+
+If the request does not complete before this period ends, the search becomes an
+<<eql-search-async,async search>>. A
+<<eql-search-api-response-body-search-id,search ID>> is returned at the end of
+the period. This ID can be used to monitor the ongoing progress of the search as
+it runs.
+
+[IMPORTANT]
+====
+You can also specify this value using the `wait_for_completion_timeout` query
+parameter. If both parameters are specified, only the query parameter is used.
+====
+--
+
 [role="child_attributes"]
 [[eql-search-api-response-body]]
 ==== {api-response-body-title}
 
+[[eql-search-api-response-body-search-id]]
+`id`::
++
+--
+Identifier for the search.
+
+This search ID is only provided if a search request does not return complete
+results during the timeout period set by the
+<<eql-search-api-wait-for-completion-timeout, `wait_for_completion_timeout`>>
+parameter, becoming an <<eql-search-async,async search>>.
+
+You can use this ID to
+monitor the progress of an ongoing async search and retrieve complete results
+when the search finishes.
+--
+
+`is_partial`::
+(boolean)
+If `true`, the response does not contain complete search results.
+
+`is_running`::
++
+--
+(boolean)
+If `true`, the search request is still executing.
+
+[IMPORTANT]
+====
+If this parameter and the `is_partial` parameter are `true`, the search is an
+<<eql-search-async,ongoing async search>>. If the `keep_alive` period does not
+pass, the complete search results will be available when the search completes.
+
+If `is_partial` is `true` but `is_running` is `false`, the search returned
+partial results due to a failure. Only some shards returned results or the node
+coordinating the search failed.
+====
+--
+
 `took`::
 +
 --
@@ -168,7 +339,7 @@ If `true`, the request timed out before completion.
 
 `hits`::
 (object)
-Contains returned events and metadata.
+Contains matching events and metadata.
 +
 .Properties of `hits`
 [%collapsible%open]
@@ -176,7 +347,7 @@ Contains returned events and metadata.
 
 `total`::
 (object)
-Metadata about the number of returned events.
+Metadata about the number of matching events.
 +
 .Properties of `total`
 [%collapsible%open]
@@ -184,7 +355,7 @@ Metadata about the number of returned events.
 
 `value`::
 (integer)
-Total number of returned events.
+Total number of matching events.
 
 `relation`::
 +
@@ -199,29 +370,80 @@ Returned values are:
 --
 =====
 
+`sequences`::
+(array of objects)
+Contains event sequences matching the query. Each object represents a
+matching sequence. This parameter is only returned for EQL queries containing
+a <<eql-sequences,sequence>>.
++
+.Properties of `sequences` objects
+[%collapsible%open]
+=====
+`join_keys`::
+(array of strings)
+Shared field values used to constrain matches in the sequence. These are defined
+using the <<eql-sequences,`by` keyword>> in the EQL query syntax.
+
 `events`::
 (array of objects)
-Contains returned events matching the query. Each object represents a
+Contains events matching the query. Each object represents a
 matching event.
 +
 .Properties of `events` objects
 [%collapsible%open]
+======
+`_index`::
+(string)
+Name of the index containing the event.
+
+`_id`::
+(string)
+(string)
+Unique identifier for the event.
+This ID is only unique within the index.
+
+`_score`::
+(float)
+Positive 32-bit floating point number used to determine the relevance of the
+ event. See <<relevance-scores>>.
+
+`_source`::
+(object)
+Original JSON body passed for the event at index time.
+
+`sort`::
+(array)
+Integer used as the sort value for the event.
++
+By default, this is the event's <<eql-search-api-timestamp-field,timestamp
+value>>, converted to milliseconds since the
+https://en.wikipedia.org/wiki/Unix_time[Unix epoch].
+======
 =====
 
+[[eql-search-api-response-events]]
+`events`::
+(array of objects)
+Contains events matching the query. Each object represents a
+matching event.
++
+.Properties of `events` objects
+[%collapsible%open]
+=====
 `_index`::
 (string)
-Name of the index containing the returned event.
+Name of the index containing the event.
 
 `_id`::
 (string)
 (string)
-Unique identifier for the returned event.
-This ID is only unique within the returned index.
+Unique identifier for the event.
+This ID is only unique within the index.
 
 `_score`::
 (float)
 Positive 32-bit floating point number used to determine the relevance of the
-returned event. See <<relevance-scores>>.
+ event. See <<relevance-scores>>.
 
 `_source`::
 (object)
@@ -240,11 +462,14 @@ https://en.wikipedia.org/wiki/Unix_time[Unix epoch].
 [[eql-search-api-example]]
 ==== {api-examples-title}
 
+[[eql-search-api-basic-query-ex]]
+==== Basic query example
+
 The following EQL search request searches for events with an `event.category` of
 `file` that meet the following conditions:
 
 * A `file.name` of `cmd.exe`
-* An `agent.id` that is _not_ `my_user`
+* An `agent.id` other than `my_user`
 
 [source,console]
 ----
@@ -256,13 +481,16 @@ GET my_index/_eql/search
 }
 ----
 
-The API returns the following response. Events in the response are sorted by
-<<eql-search-api-timestamp-field,timestamp>>, converted to milliseconds since
-the https://en.wikipedia.org/wiki/Unix_time[Unix epoch], in ascending order.
+The API returns the following response. Matching events in the `hits.events`
+property are sorted by <<eql-search-api-timestamp-field,timestamp>>, converted
+to milliseconds since the https://en.wikipedia.org/wiki/Unix_time[Unix epoch],
+in ascending order.
 
 [source,console-result]
 ----
 {
+  "is_partial": false,
+  "is_running": false,
   "took": 6,
   "timed_out": false,
   "hits": {
@@ -328,9 +556,132 @@ the https://en.wikipedia.org/wiki/Unix_time[Unix epoch], in ascending order.
         ]
       }
     ]
-  },
+  }
+}
+----
+// TESTRESPONSE[s/"took": 6/"took": $body.took/]
+
+[[eql-search-api-sequence-ex]]
+==== Sequence query example
+
+The following EQL search request matches a <<eql-sequences,sequence>> of events
+that:
+
+. Start with an event with:
++
+--
+* An `event.category` of `file`
+* A `file.name` of `cmd.exe`
+* An `agent.id` other than `my_user`
+--
+. Followed by an event with:
++
+--
+* An `event.category` of `process`
+* A `process.path` that contains the substring `regsvr32`
+--
+
+These events must also share the same `agent.id` value.
+
+[source,console]
+----
+GET my_index/_eql/search
+{
+  "query": """
+    sequence by agent.id
+      [ file where file.name == "cmd.exe" and agent.id != "my_user" ]
+      [ process where stringContains(process.path, "regsvr32") ]
+  """
+}
+----
+
+The API returns the following response. The `hits.sequences.join_keys` property
+contains the shared `agent.id` value for each matching event. Matching events in
+the `hits.sequences.events` property are sorted by
+<<eql-search-api-timestamp-field,timestamp>>, converted to milliseconds since
+the https://en.wikipedia.org/wiki/Unix_time[Unix epoch], in ascending order.
+
+[source,console-result]
+----
+{
   "is_partial": false,
-  "is_running": false
+  "is_running": false,
+  "took": 6,
+  "timed_out": false,
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "sequences": [
+      {
+        "join_keys": [
+          "8a4f500d"
+        ],
+        "events": [
+          {
+            "_index": "my_index",
+            "_id": "4",
+            "_score": null,
+            "_source": {
+              "@timestamp": "2020-12-07T11:07:08.000Z",
+              "agent": {
+                "id": "8a4f500d"
+              },
+              "event": {
+                "category": "file"
+              },
+              "file": {
+                "accessed": "2020-12-07T11:07:08.000Z",
+                "name": "cmd.exe",
+                "path": "C:\\Windows\\System32\\cmd.exe",
+                "type": "file",
+                "size": 16384
+              },
+              "process": {
+                "name": "cmd.exe",
+                "path": "C:\\Windows\\System32\\cmd.exe"
+              }
+            },
+            "fields": {
+              "@timestamp": [
+                "1607339228000"
+              ]
+            },
+            "sort": [
+              1607339228000
+            ]
+          },
+          {
+            "_index": "my_index",
+            "_id": "5",
+            "_score": null,
+            "_source": {
+              "@timestamp": "2020-12-07T11:07:09.000Z",
+              "agent": {
+                "id": "8a4f500d"
+              },
+              "event": {
+                "category": "process"
+              },
+              "process": {
+                "name": "regsvr32.exe",
+                "path": "C:\\Windows\\System32\\regsvr32.exe"
+              }
+            },
+            "fields": {
+              "@timestamp": [
+                "1607339229000"
+              ]
+            },
+            "sort": [
+              1607339229000
+            ]
+          }
+        ]
+      }
+    ]
+  }
 }
 ----
 // TESTRESPONSE[s/"took": 6/"took": $body.took/]

+ 3 - 1
docs/reference/eql/limitations.asciidoc

@@ -37,4 +37,6 @@ queries that contain:
 
 * {eql-ref}/pipes.html[Pipes]
 
-* {eql-ref}/sequences.html[Sequences]
+* {eql-ref}/sequences.html[State and timespan-related sequence keywords]:
+** `with maxspan`
+** `until`

+ 194 - 3
docs/reference/eql/search.asciidoc

@@ -70,6 +70,8 @@ https://en.wikipedia.org/wiki/Unix_time[Unix epoch], in ascending order.
 [source,console-result]
 ----
 {
+  "is_partial": false,
+  "is_running": false,
   "took": 60,
   "timed_out": false,
   "hits": {
@@ -121,9 +123,7 @@ https://en.wikipedia.org/wiki/Unix_time[Unix epoch], in ascending order.
         ]
       }
     ]
-  },
-  "is_partial": false,
-  "is_running": false
+  }
 }
 ----
 // TESTRESPONSE[s/"took": 60/"took": $body.took/]
@@ -215,3 +215,194 @@ GET /sec_logs/_eql/search
 }
 ----
 ====
+
+[discrete]
+[[eql-search-async]]
+=== Run an async EQL search
+
+EQL searches in {es} are designed to run on large volumes of data quickly,
+often returning results in milliseconds. Because of this, the EQL search API
+runs _synchronous_ searches by default. This means the search request
+blocks other requests and waits for complete results before returning a
+response.
+
+However, complete results can take longer for searches across:
+
+* <<frozen-indices,Frozen indices>>
+* <<modules-cross-cluster-search,Multiple clusters>>
+* Many shards
+
+To avoid long waits, you can use the EQL search API's
+`wait_for_completion_timeout` parameter to run an _asynchronous_, or _async_,
+search.
+
+Set the `wait_for_completion_timeout` parameter to a duration you'd like to wait
+for complete search results. If the search request does not finish within this
+period, the search becomes an async search. The EQL search
+API returns a response that includes:
+
+* A search ID, which can be used to monitor the progress of the async search and
+  retrieve complete results when it finishes.
+* An `is_partial` value of `true`, indicating the response does not contain
+  complete search results.
+* An `is_running` value of `true`, indicating the search is async and ongoing.
+
+The async search continues to run in the background without blocking
+other requests.
+
+[%collapsible]
+.*Example*
+====
+The following request searches the `frozen_sec_logs` index, which has been
+<<frozen-indices,frozen>> for storage and is rarely searched.
+
+Because searches on frozen indices are expected to take longer to complete, the
+request contains a `wait_for_completion_timeout` query parameter value of `2s`
+(two seconds).
+
+If the request does not return complete results in two seconds, the search
+becomes an async search and a search ID is returned.
+
+[source,console]
+----
+GET /frozen_sec_logs/_eql/search
+{
+  "wait_for_completion_timeout": "2s",
+  "query": """
+    process where process.name == "cmd.exe"
+  """
+}
+----
+// TEST[s/frozen_sec_logs/sec_logs/]
+// TEST[s/"wait_for_completion_timeout": "2s"/"wait_for_completion_timeout": "2s"/]
+
+After two seconds, the request returns the following response.
+
+[source,console-result]
+----
+{
+  "id": "FmNJRUZ1YWZCU3dHY1BIOUhaenVSRkEaaXFlZ3h4c1RTWFNocDdnY2FSaERnUTozNDE=",
+  "is_partial": true,
+  "is_running": true,
+  "took": 2000,
+  "timed_out": false,
+  "hits": {
+    "total": {
+      "value": 2,
+      "relation": "eq"
+    },
+    "events": [
+      {
+        "_index": "frozen_sec_logs",
+        "_id": "1",
+        "_score": null,
+        "_source": {
+          "@timestamp": "2020-12-06T11:04:05.000Z",
+          "agent": {
+            "id": "8a4f500d"
+          },
+          "event": {
+            "category": "process"
+          },
+          "process": {
+            "name": "cmd.exe",
+            "path": "C:\\Windows\\System32\\cmd.exe"
+          }
+        },
+        "sort": [
+          1607252645000
+        ]
+      },
+      {
+        "_index": "frozen_sec_logs",
+        "_id": "3",
+        "_score": null,
+        "_source": {
+          "@timestamp": "2020-12-07T11:06:07.000Z",
+          "agent": {
+            "id": "8a4f500d"
+          },
+          "event": {
+            "category": "process"
+          },
+          "process": {
+            "name": "cmd.exe",
+            "path": "C:\\Windows\\System32\\cmd.exe"
+          }
+        },
+        "sort": [
+          1607339167000
+        ]
+      }
+    ]
+  }
+}
+----
+// TESTRESPONSE[s/FmNJRUZ1YWZCU3dHY1BIOUhaenVSRkEaaXFlZ3h4c1RTWFNocDdnY2FSaERnUTozNDE=/$body.id/]
+// TESTRESPONSE[s/"is_partial": true/"is_partial": $body.is_partial/]
+// TESTRESPONSE[s/"is_running": true/"is_running": $body.is_running/]
+// TESTRESPONSE[s/"took": 2000/"took": $body.took/]
+// TESTRESPONSE[s/"_index": "frozen_sec_logs"/"_index": "sec_logs"/]
+====
+
+[discrete]
+[[eql-search-store-async-eql-search]]
+=== Change the search retention period
+
+By default, the EQL search API only stores async searches and their results for
+five days. After this period, any ongoing searches or saved results are deleted.
+
+You can use the `keep_alive` parameter to change the duration of this period.
+
+[%collapsible]
+.*Example*
+====
+In the following EQL search API request, the `keep_alive` parameter is `2d` (two
+days). This means that if the search becomes async, its results
+are stored on the cluster for two days. After two days, the async
+search and its results are deleted, even if it's still ongoing.
+
+[source,console]
+----
+GET /sec_logs/_eql/search
+{
+  "keep_alive": "2d",
+  "wait_for_completion_timeout": "2s",
+  "query": """
+    process where process.name == "cmd.exe"
+  """
+}
+----
+====
+
+[discrete]
+[[eql-search-store-sync-eql-search]]
+=== Store synchronous EQL searches
+
+By default, the EQL search API only stores async searches that cannot be
+completed within the period set by the `wait_for_completion_timeout` parameter.
+
+To save the results of searches that complete during this period, set the
+`keep_on_completion` parameter to `true`. Note these saved searches are still
+subject to the storage retention period set by the `keep_alive` parameter.
+
+[%collapsible]
+.*Example*
+====
+In the following EQL search API request, the `keep_on_completion` parameter is
+`true`. This means the search results are stored on the cluster, even if
+the search completes within the `2s` (two-second) period set by the
+`wait_for_completion_timeout` parameter.
+
+[source,console]
+----
+GET /sec_logs/_eql/search
+{
+  "keep_on_completion": true,
+  "wait_for_completion_timeout": "2s",
+  "query": """
+    process where process.name == "cmd.exe"
+  """
+}
+----
+====

+ 79 - 1
docs/reference/eql/syntax.asciidoc

@@ -341,6 +341,84 @@ dots (`.`), hyphens (`-`), or spaces, must be escaped using backticks (+++`+++).
 `my field`
 ----
 
+[discrete]
+[[eql-sequences]]
+=== Sequences
+
+You can use EQL sequences to describe and match an ordered series of events.
+Each item in a sequence is an event category and event condition,
+surrounded by square brackets. Events are listed in ascending chronological
+order, with the most recent event listed last.
+
+[source,eql]
+----
+sequence
+  [ event_category_1 where condition_1 ]
+  [ event_category_2 where condition_2 ]
+  ...
+----
+
+.*Example*
+[%collapsible]
+====
+The following EQL query matches this series of ordered events:
+
+. Start with an event with:
++
+--
+* An event category of `file`
+* A `file.extension` of `exe`
+--
+. Followed by an event with an event category of `process`
+
+[source,eql]
+----
+sequence
+  [ file where file.extension == "exe" ]
+  [ process where true ]
+----
+====
+
+You can use the `by` keyword with sequences to only match events that share the
+same field values. If a field value should be shared across all events, you
+can use `sequence by`.
+
+[source,eql]
+----
+sequence by field_foo
+  [ event_category_1 where condition_1 ] by field_baz
+  [ event_category_2 where condition_2 ] by field_bar
+  ...
+----
+
+.*Example*
+[%collapsible]
+====
+The following sequence uses the `by` keyword to constrain matching events to:
+
+* Events with the same `user.name` value
+* `file` events with a `file.path` value equal to the following `process`
+   event's `process.path` value.
+
+[source,eql]
+----
+sequence
+  [ file where file.extension == "exe" ] by user.name, file.path
+  [ process where true ] by user.name, process.path
+----
+
+Because the `user.name` field is shared across all events in the sequence, it
+can be included using `sequence by`. The following sequence is equivalent to the
+prior one.
+
+[source,eql]
+----
+sequence by user.name
+  [ file where file.extension == "exe" ] by file.path
+  [ process where true ] by process.path
+----
+====
+
 [discrete]
 [[eql-functions]]
 === Functions
@@ -394,4 +472,4 @@ file where file.extension in ("exe", "dll")
 
 We recommend testing and benchmarking any indexing changes before deploying them
 in production. See <<tune-for-indexing-speed>> and <<tune-for-search-speed>>.
-====
+====