Browse Source

[DOCS] EQL: Document how sequence queries handle matches (#65794)

Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
James Rodewig 4 years ago
parent
commit
ef6fb59ec3
1 changed files with 144 additions and 0 deletions
  1. 144 0
      docs/reference/eql/syntax.asciidoc

+ 144 - 0
docs/reference/eql/syntax.asciidoc

@@ -781,3 +781,147 @@ three double quotes (`"""`) instead.
 *** {eql-ref}/pipes.html#sort[`sort`]
 *** {eql-ref}/pipes.html#unique[`unique`]
 *** {eql-ref}/pipes.html#unique-count[`unique_count`]
+
+[discrete]
+[[eql-how-sequence-queries-handle-matches]]
+==== How sequence queries handle matches
+
+<<eql-sequences,Sequence queries>> don't find all potential matches for a
+sequence. This approach would be too slow and costly for large event data sets.
+Instead, a sequence query handles pending sequence matches as a
+{wikipedia}/Finite-state_machine[state machine]:
+
+* Each event item in the sequence query is a state in the machine.
+* Only one pending sequence can be in each state at a time.
+* If two pending sequences are in the same state at the same time, the most
+recent sequence overwrites the older one.
+* If the query includes <<eql-by-keyword,`by` fields>>, the query uses a
+separate state machine for each unique `by` field value.
+
+.*Example* 
+[%collapsible]
+====
+A data set contains the following `process` events in ascending chronological
+order:
+
+[source,js]
+----
+{ "index" : { "_id" : "1" } }
+{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
+{ "index" : { "_id" : "2" } }
+{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
+{ "index" : { "_id" : "3" } }
+{ "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
+{ "index" : { "_id" : "4" } }
+{ "user": { "name": "root" }, "process": { "name": "bash" }, ...}
+{ "index" : { "_id" : "5" } }
+{ "user": { "name": "root" }, "process": { "name": "bash" }, ...}
+{ "index" : { "_id" : "6" } }
+{ "user": { "name": "elkbee" }, "process": { "name": "attrib" }, ...}
+{ "index" : { "_id" : "7" } }
+{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
+{ "index" : { "_id" : "8" } }
+{ "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
+{ "index" : { "_id" : "9" } }
+{ "user": { "name": "root" }, "process": { "name": "cat" }, ...}
+{ "index" : { "_id" : "10" } }
+{ "user": { "name": "elkbee" }, "process": { "name": "cat" }, ...}
+{ "index" : { "_id" : "11" } }
+{ "user": { "name": "root" }, "process": { "name": "cat" }, ...}
+----
+// NOTCONSOLE
+
+An EQL sequence query searches the data set:
+
+[source,eql]
+----
+sequence by user.name
+  [process where process.name == "attrib"]
+  [process where process.name == "bash"]
+  [process where process.name == "cat"]
+----
+
+The query's event items correspond to the following states:
+
+* State A:  `[process where process.name == "attrib"]`
+* State B:  `[process where process.name == "bash"]`
+* Complete: `[process where process.name == "cat"]`
+
+To find matching sequences, the query uses separate state machines for each
+unique `user.name` value. Pending sequence matches move through each machine's
+states as follows:
+
+[source,txt]
+----
+{ "index" : { "_id" : "1" } }
+{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
+// Creates sequence [1] in state A for the "root" user.
+//
+// root: A=[1]
+
+{ "index" : { "_id" : "2" } }
+{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
+// Creates sequence [2] in state A for "root", overwriting sequence [1].
+//
+// root: A=[2]
+
+{ "index" : { "_id" : "3" } }
+{ "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
+// Nothing happens. The "elkbee" user has no pending sequence to move from state A to state B
+
+{ "index" : { "_id" : "4" } }
+{ "user": { "name": "root" }, "process": { "name": "bash" }, ...}
+// Sequence [2] moves out of state A for "root". State B for "root" now contains [2, 4]
+// State A for "root" is now empty.
+//
+// root: A=[]
+// root: B=[2, 4]
+
+{ "index" : { "_id" : "5" } }
+{ "user": { "name": "root" }, "process": { "name": "bash" }, ...}
+// Nothing happens. State A is empty for "root".
+
+{ "index" : { "_id" : "6" } }
+{ "user": { "name": "elkbee" }, "process": { "name": "attrib" }, ...}
+// Creates sequence [6] in state A for "elkbee".
+//
+// elkbee: A=[6]
+
+{ "index" : { "_id" : "7" } }
+{ "user": { "name": "root" }, "process": { "name": "attrib" }, ...}
+// Creates sequence [7] in state A for "root".
+// Sequence [2, 4] remains in state B for "root".
+//
+// root: A=[7]
+// root: B=[2, 4]
+
+{ "index" : { "_id" : "8" } }
+{ "user": { "name": "elkbee" }, "process": { "name": "bash" }, ...}
+// Sequence [6, 8] moves to state B for "elkbee".
+// State A for "elkbee" is now empty.
+//
+// elkbee: A=[]
+// elkbee: B=[6, 8]
+
+{ "index" : { "_id" : "9" } }
+{ "user": { "name": "root" }, "process": { "name": "cat" }, ...}
+// Sequence [2, 4, 9] is complete for "root".
+// State B for "root" is now empty.
+// Sequence [7] remains in state A.
+//
+// root: A=[7]
+// root: B=[]
+
+{ "index" : { "_id" : "10" } }
+{ "user": { "name": "elkbee" }, "process": { "name": "cat" }, ...}
+// Sequence [6, 8, 10] is complete for "elkbee".
+// State A and B for "elkbee" are now empty.
+//
+// elkbee: A=[]
+// elkbee: B=[]
+
+{ "index" : { "_id" : "11" } }
+{ "user": { "name": "root" }, "process": { "name": "cat" }, ...}
+// Nothing happens. State B for "root" is empty.
+----
+====