123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330 |
- [[esql-process-data-with-dissect-and-grok]]
- === Data processing with DISSECT and GROK
- ++++
- <titleabbrev>Data processing with DISSECT and GROK</titleabbrev>
- ++++
- Your data may contain unstructured strings that you want to structure. This
- makes it easier to analyze the data. For example, log messages may contain IP
- addresses that you want to extract so you can find the most active IP addresses.
- image::images/esql/unstructured-data.png[align="center",width=75%]
- {es} can structure your data at index time or query time. At index time, you can
- use the <<dissect-processor,Dissect>> and <<grok-processor,Grok>> ingest
- processors, or the {ls} {logstash-ref}/plugins-filters-dissect.html[Dissect] and
- {logstash-ref}/plugins-filters-grok.html[Grok] filters. At query time, you can
- use the {esql} <<esql-dissect>> and <<esql-grok>> commands.
- [[esql-grok-or-dissect]]
- ==== `DISSECT` or `GROK`? Or both?
- `DISSECT` works by breaking up a string using a delimiter-based pattern. `GROK`
- works similarly, but uses regular expressions. This makes `GROK` more powerful,
- but generally also slower. `DISSECT` works well when data is reliably repeated.
- `GROK` is a better choice when you really need the power of regular expressions,
- for example when the structure of your text varies from row to row.
- You can use both `DISSECT` and `GROK` for hybrid use cases. For example when a
- section of the line is reliably repeated, but the entire line is not. `DISSECT`
- can deconstruct the section of the line that is repeated. `GROK` can process the
- remaining field values using regular expressions.
- [[esql-process-data-with-dissect]]
- ==== Process data with `DISSECT`
- The <<esql-dissect>> processing command matches a string against a
- delimiter-based pattern, and extracts the specified keys as columns.
- For example, the following pattern:
- [source,txt]
- ----
- %{clientip} [%{@timestamp}] %{status}
- ----
- matches a log line of this format:
- [source,txt]
- ----
- 1.2.3.4 [2023-01-23T12:15:00.000Z] Connected
- ----
- and results in adding the following columns to the input table:
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- clientip:keyword | @timestamp:keyword | status:keyword
- 1.2.3.4 | 2023-01-23T12:15:00.000Z | Connected
- |===
- [[esql-dissect-patterns]]
- ===== Dissect patterns
- include::../ingest/processors/dissect.asciidoc[tag=intro-example-explanation]
- An empty key (`%{}`) or <<esql-named-skip-key,named skip key>> can be used to
- match values, but exclude the value from the output.
- All matched values are output as keyword string data types. Use the
- <<esql-type-conversion-functions>> to convert to another data type.
- Dissect also supports <<esql-dissect-key-modifiers,key modifiers>> that can
- change dissect's default behavior. For example, you can instruct dissect to
- ignore certain fields, append fields, skip over padding, etc.
- [[esql-dissect-terminology]]
- ===== Terminology
- dissect pattern::
- the set of fields and delimiters describing the textual
- format. Also known as a dissection.
- The dissection is described using a set of `%{}` sections:
- `%{a} - %{b} - %{c}`
- field::
- the text from `%{` to `}` inclusive.
- delimiter::
- the text between `}` and the next `%{` characters.
- Any set of characters other than `%{`, `'not }'`, or `}` is a delimiter.
- key::
- +
- --
- the text between the `%{` and `}`, exclusive of the `?`, `+`, `&` prefixes
- and the ordinal suffix.
- Examples:
- * `%{?aaa}` - the key is `aaa`
- * `%{+bbb/3}` - the key is `bbb`
- * `%{&ccc}` - the key is `ccc`
- --
- [[esql-dissect-examples]]
- ===== Examples
- include::processing-commands/dissect.asciidoc[tag=examples]
- [[esql-dissect-key-modifiers]]
- ===== Dissect key modifiers
- include::../ingest/processors/dissect.asciidoc[tag=dissect-key-modifiers]
- [[esql-dissect-key-modifiers-table]]
- .Dissect key modifiers
- [options="header",role="styled"]
- |======
- | Modifier | Name | Position | Example | Description | Details
- | `->` | Skip right padding | (far) right | `%{keyname1->}` | Skips any repeated characters to the right | <<esql-dissect-modifier-skip-right-padding,link>>
- | `+` | Append | left | `%{+keyname} %{+keyname}` | Appends two or more fields together | <<esql-append-modifier,link>>
- | `+` with `/n` | Append with order | left and right | `%{+keyname/2} %{+keyname/1}` | Appends two or more fields together in the order specified | <<esql-append-order-modifier,link>>
- | `?` | Named skip key | left | `%{?ignoreme}` | Skips the matched value in the output. Same behavior as `%{}`| <<esql-named-skip-key,link>>
- |======
- [[esql-dissect-modifier-skip-right-padding]]
- ====== Right padding modifier (`->`)
- include::../ingest/processors/dissect.asciidoc[tag=dissect-modifier-skip-right-padding]
- For example:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/docs.csv-spec[tag=dissectRightPaddingModifier]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/docs.csv-spec[tag=dissectRightPaddingModifier-result]
- |===
- include::../ingest/processors/dissect.asciidoc[tag=dissect-modifier-empty-right-padding]
- For example:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/docs.csv-spec[tag=dissectEmptyRightPaddingModifier]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/docs.csv-spec[tag=dissectEmptyRightPaddingModifier-result]
- |===
- [[esql-append-modifier]]
- ====== Append modifier (`+`)
- include::../ingest/processors/dissect.asciidoc[tag=append-modifier]
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/docs.csv-spec[tag=dissectAppendModifier]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/docs.csv-spec[tag=dissectAppendModifier-result]
- |===
- [[esql-append-order-modifier]]
- ====== Append with order modifier (`+` and `/n`)
- include::../ingest/processors/dissect.asciidoc[tag=append-order-modifier]
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/docs.csv-spec[tag=dissectAppendWithOrderModifier]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/docs.csv-spec[tag=dissectAppendWithOrderModifier-result]
- |===
- [[esql-named-skip-key]]
- ====== Named skip key (`?`)
- include::../ingest/processors/dissect.asciidoc[tag=named-skip-key]
- This can be done with a named skip key using the `{?name}` syntax. In the
- following query, `ident` and `auth` are not added to the output table:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/docs.csv-spec[tag=dissectNamedSkipKey]
- ----
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- include::{esql-specs}/docs.csv-spec[tag=dissectNamedSkipKey-result]
- |===
- [[esql-dissect-limitations]]
- ===== Limitations
- // tag::dissect-limitations[]
- The `DISSECT` command does not support reference keys.
- // end::dissect-limitations[]
- [[esql-process-data-with-grok]]
- ==== Process data with `GROK`
- The <<esql-grok>> processing command matches a string against a pattern based on
- regular expressions, and extracts the specified keys as columns.
- For example, the following pattern:
- [source,txt]
- ----
- %{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}
- ----
- matches a log line of this format:
- [source,txt]
- ----
- 1.2.3.4 [2023-01-23T12:15:00.000Z] Connected
- ----
- Putting it together as an {esql} query:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/docs.csv-spec[tag=grokWithEscapeTripleQuotes]
- ----
- `GROK` adds the following columns to the input table:
- [%header.monospaced.styled,format=dsv,separator=|]
- |===
- @timestamp:keyword | ip:keyword | status:keyword
- 2023-01-23T12:15:00.000Z | 1.2.3.4 | Connected
- |===
- [NOTE]
- ====
- Special regex characters in grok patterns, like `[` and `]` need to be escaped
- with a `\`. For example, in the earlier pattern:
- [source,txt]
- ----
- %{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}
- ----
- In {esql} queries, when using single quotes for strings, the backslash character itself is a special character that
- needs to be escaped with another `\`. For this example, the corresponding {esql}
- query becomes:
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/docs.csv-spec[tag=grokWithEscape]
- ----
- For this reason, in general it is more convenient to use triple quotes `"""` for GROK patterns,
- that do not require escaping for backslash.
- [source.merge.styled,esql]
- ----
- include::{esql-specs}/docs.csv-spec[tag=grokWithEscapeTripleQuotes]
- ----
- ====
- [[esql-grok-patterns]]
- ===== Grok patterns
- The syntax for a grok pattern is `%{SYNTAX:SEMANTIC}`
- The `SYNTAX` is the name of the pattern that matches your text. For example,
- `3.44` is matched by the `NUMBER` pattern and `55.3.244.1` is matched by the
- `IP` pattern. The syntax is how you match.
- The `SEMANTIC` is the identifier you give to the piece of text being matched.
- For example, `3.44` could be the duration of an event, so you could call it
- simply `duration`. Further, a string `55.3.244.1` might identify the `client`
- making a request.
- By default, matched values are output as keyword string data types. To convert a
- semantic's data type, suffix it with the target data type. For example
- `%{NUMBER:num:int}`, which converts the `num` semantic from a string to an
- integer. Currently the only supported conversions are `int` and `float`. For
- other types, use the <<esql-type-conversion-functions>>.
- For an overview of the available patterns, refer to
- {es-repo}/blob/{branch}/libs/grok/src/main/resources/patterns[GitHub]. You can
- also retrieve a list of all patterns using a <<grok-processor-rest-get,REST
- API>>.
- [[esql-grok-regex]]
- ===== Regular expressions
- Grok is based on regular expressions. Any regular expressions are valid in grok
- as well. Grok uses the Oniguruma regular expression library. Refer to
- https://github.com/kkos/oniguruma/blob/master/doc/RE[the Oniguruma GitHub
- repository] for the full supported regexp syntax.
- [[esql-custom-patterns]]
- ===== Custom patterns
- If grok doesn't have a pattern you need, you can use the Oniguruma syntax for
- named capture which lets you match a piece of text and save it as a column:
- [source,txt]
- ----
- (?<field_name>the pattern here)
- ----
- For example, postfix logs have a `queue id` that is a 10 or 11-character
- hexadecimal value. This can be captured to a column named `queue_id` with:
- [source,txt]
- ----
- (?<queue_id>[0-9A-F]{10,11})
- ----
- [[esql-grok-examples]]
- ===== Examples
- include::processing-commands/grok.asciidoc[tag=examples]
- [[esql-grok-debugger]]
- ===== Grok debugger
- To write and debug grok patterns, you can use the
- {kibana-ref}/xpack-grokdebugger.html[Grok Debugger]. It provides a UI for
- testing patterns against sample data. Under the covers, it uses the same engine
- as the `GROK` command.
- [[esql-grok-limitations]]
- ===== Limitations
- // tag::grok-limitations[]
- The `GROK` command does not support configuring <<custom-patterns,custom
- patterns>>, or <<trace-match,multiple patterns>>. The `GROK` command is not
- subject to <<grok-watchdog,Grok watchdog settings>>.
- // end::grok-limitations[]
|