| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330 | [[esql-process-data-with-dissect-and-grok]]=== Data processing with DISSECT and GROK++++<titleabbrev>Data processing with DISSECT and GROK</titleabbrev>++++Your data may contain unstructured strings that you want to structure. Thismakes it easier to analyze the data. For example, log messages may contain IPaddresses that you want to extract so you can find the most active IP addresses.image::images/esql/unstructured-data.png[align="center",width=75%]{es} can structure your data at index time or query time. At index time, you canuse the <<dissect-processor,Dissect>> and <<grok-processor,Grok>> ingestprocessors, or the {ls} {logstash-ref}/plugins-filters-dissect.html[Dissect] and{logstash-ref}/plugins-filters-grok.html[Grok] filters. At query time, you canuse the {esql} <<esql-dissect>> and <<esql-grok>> commands.[[esql-grok-or-dissect]]==== `DISSECT` or `GROK`? Or both?`DISSECT` works by breaking up a string using a delimiter-based pattern. `GROK`works similarly, but uses regular expressions. This makes `GROK` more powerful,but generally also slower. `DISSECT` works well when data is reliably repeated.`GROK` is a better choice when you really need the power of regular expressions,for example when the structure of your text varies from row to row.You can use both `DISSECT` and `GROK` for hybrid use cases. For example when asection of the line is reliably repeated, but the entire line is not. `DISSECT`can deconstruct the section of the line that is repeated. `GROK` can process theremaining field values using regular expressions.[[esql-process-data-with-dissect]]==== Process data with `DISSECT`The <<esql-dissect>> processing command matches a string against adelimiter-based pattern, and extracts the specified keys as columns.For example, the following pattern:[source,txt]----%{clientip} [%{@timestamp}] %{status}----matches a log line of this format:[source,txt]----1.2.3.4 [2023-01-23T12:15:00.000Z] Connected----and results in adding the following columns to the input table:[%header.monospaced.styled,format=dsv,separator=|]|===clientip:keyword | @timestamp:keyword | status:keyword1.2.3.4 | 2023-01-23T12:15:00.000Z | Connected|===[[esql-dissect-patterns]]===== Dissect patternsinclude::../ingest/processors/dissect.asciidoc[tag=intro-example-explanation]An empty key (`%{}`) or <<esql-named-skip-key,named skip key>> can be used tomatch values, but exclude the value from the output.All matched values are output as keyword string data types. Use the<<esql-type-conversion-functions>> to convert to another data type.Dissect also supports <<esql-dissect-key-modifiers,key modifiers>> that canchange dissect's default behavior. For example, you can instruct dissect toignore certain fields, append fields, skip over padding, etc.[[esql-dissect-terminology]]===== Terminologydissect pattern::the set of fields and delimiters describing the textualformat. Also known as a dissection.The dissection is described using a set of `%{}` sections:`%{a} - %{b} - %{c}`field::the text from `%{` to `}` inclusive.delimiter::the text between `}` and the next `%{` characters.Any set of characters other than `%{`, `'not }'`, or `}` is a delimiter.key::+--the text between the `%{` and `}`, exclusive of the `?`, `+`, `&` prefixesand the ordinal suffix.Examples:* `%{?aaa}` - the key is `aaa`* `%{+bbb/3}` - the key is `bbb`* `%{&ccc}` - the key is `ccc`--[[esql-dissect-examples]]===== Examplesinclude::processing-commands/dissect.asciidoc[tag=examples][[esql-dissect-key-modifiers]]===== Dissect key modifiersinclude::../ingest/processors/dissect.asciidoc[tag=dissect-key-modifiers][[esql-dissect-key-modifiers-table]].Dissect key modifiers[options="header",role="styled"]|======| Modifier      | Name               | Position       | Example                      | Description                                                  | Details| `->`          | Skip right padding | (far) right    | `%{keyname1->}`  | Skips any repeated characters to the right                   | <<esql-dissect-modifier-skip-right-padding,link>>| `+`           | Append             | left           | `%{+keyname} %{+keyname}`    | Appends two or more fields together                          | <<esql-append-modifier,link>>| `+` with `/n` | Append with order  | left and right | `%{+keyname/2} %{+keyname/1}` | Appends two or more fields together in the order specified   | <<esql-append-order-modifier,link>>| `?`           | Named skip key     | left           | `%{?ignoreme}`  | Skips the matched value in the output. Same behavior as `%{}`| <<esql-named-skip-key,link>>|======[[esql-dissect-modifier-skip-right-padding]]====== Right padding modifier (`->`)include::../ingest/processors/dissect.asciidoc[tag=dissect-modifier-skip-right-padding]For example:[source.merge.styled,esql]----include::{esql-specs}/docs.csv-spec[tag=dissectRightPaddingModifier]----[%header.monospaced.styled,format=dsv,separator=|]|===include::{esql-specs}/docs.csv-spec[tag=dissectRightPaddingModifier-result]|===include::../ingest/processors/dissect.asciidoc[tag=dissect-modifier-empty-right-padding]For example:[source.merge.styled,esql]----include::{esql-specs}/docs.csv-spec[tag=dissectEmptyRightPaddingModifier]----[%header.monospaced.styled,format=dsv,separator=|]|===include::{esql-specs}/docs.csv-spec[tag=dissectEmptyRightPaddingModifier-result]|===[[esql-append-modifier]]====== Append modifier (`+`)include::../ingest/processors/dissect.asciidoc[tag=append-modifier][source.merge.styled,esql]----include::{esql-specs}/docs.csv-spec[tag=dissectAppendModifier]----[%header.monospaced.styled,format=dsv,separator=|]|===include::{esql-specs}/docs.csv-spec[tag=dissectAppendModifier-result]|===[[esql-append-order-modifier]]====== Append with order modifier (`+` and `/n`)include::../ingest/processors/dissect.asciidoc[tag=append-order-modifier][source.merge.styled,esql]----include::{esql-specs}/docs.csv-spec[tag=dissectAppendWithOrderModifier]----[%header.monospaced.styled,format=dsv,separator=|]|===include::{esql-specs}/docs.csv-spec[tag=dissectAppendWithOrderModifier-result]|===[[esql-named-skip-key]]====== Named skip key (`?`)include::../ingest/processors/dissect.asciidoc[tag=named-skip-key]This can be done with a named skip key using the `{?name}` syntax. In thefollowing query, `ident` and `auth` are not added to the output table:[source.merge.styled,esql]----include::{esql-specs}/docs.csv-spec[tag=dissectNamedSkipKey]----[%header.monospaced.styled,format=dsv,separator=|]|===include::{esql-specs}/docs.csv-spec[tag=dissectNamedSkipKey-result]|===[[esql-dissect-limitations]]===== Limitations// tag::dissect-limitations[]The `DISSECT` command does not support reference keys.// end::dissect-limitations[][[esql-process-data-with-grok]]==== Process data with `GROK`The <<esql-grok>> processing command matches a string against a pattern based onregular expressions, and extracts the specified keys as columns.For example, the following pattern:[source,txt]----%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}----matches a log line of this format:[source,txt]----1.2.3.4 [2023-01-23T12:15:00.000Z] Connected----Putting it together as an {esql} query:[source.merge.styled,esql]----include::{esql-specs}/docs.csv-spec[tag=grokWithEscapeTripleQuotes]----`GROK` adds the following columns to the input table:[%header.monospaced.styled,format=dsv,separator=|]|===@timestamp:keyword | ip:keyword | status:keyword2023-01-23T12:15:00.000Z | 1.2.3.4 | Connected|===[NOTE]====Special regex characters in grok patterns, like `[` and `]` need to be escapedwith a `\`. For example, in the earlier pattern:[source,txt]----%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}----In {esql} queries, when using single quotes for strings, the backslash character itself is a special character thatneeds to be escaped with another `\`. For this example, the corresponding {esql}query becomes:[source.merge.styled,esql]----include::{esql-specs}/docs.csv-spec[tag=grokWithEscape]----For this reason, in general it is more convenient to use triple quotes `"""` for GROK patterns,that do not require escaping for backslash.[source.merge.styled,esql]----include::{esql-specs}/docs.csv-spec[tag=grokWithEscapeTripleQuotes]----====[[esql-grok-patterns]]===== Grok patternsThe syntax for a grok pattern is `%{SYNTAX:SEMANTIC}`The `SYNTAX` is the name of the pattern that matches your text. For example,`3.44` is matched by the `NUMBER` pattern and `55.3.244.1` is matched by the`IP` pattern. The syntax is how you match.The `SEMANTIC` is the identifier you give to the piece of text being matched.For example, `3.44` could be the duration of an event, so you could call itsimply `duration`. Further, a string `55.3.244.1` might identify the `client`making a request.By default, matched values are output as keyword string data types. To convert asemantic's data type, suffix it with the target data type. For example`%{NUMBER:num:int}`, which converts the `num` semantic from a string to aninteger. Currently the only supported conversions are `int` and `float`. Forother types, use the <<esql-type-conversion-functions>>.For an overview of the available patterns, refer to{es-repo}/blob/{branch}/libs/grok/src/main/resources/patterns[GitHub]. You canalso retrieve a list of all patterns using a <<grok-processor-rest-get,RESTAPI>>.[[esql-grok-regex]]===== Regular expressionsGrok is based on regular expressions. Any regular expressions are valid in grokas well. Grok uses the Oniguruma regular expression library. Refer tohttps://github.com/kkos/oniguruma/blob/master/doc/RE[the Oniguruma GitHubrepository] for the full supported regexp syntax.[[esql-custom-patterns]]===== Custom patternsIf grok doesn't have a pattern you need, you can use the Oniguruma syntax fornamed capture which lets you match a piece of text and save it as a column:[source,txt]----(?<field_name>the pattern here)----For example, postfix logs have a `queue id` that is a 10 or 11-characterhexadecimal value. This can be captured to a column named `queue_id` with:[source,txt]----(?<queue_id>[0-9A-F]{10,11})----[[esql-grok-examples]]===== Examplesinclude::processing-commands/grok.asciidoc[tag=examples][[esql-grok-debugger]]===== Grok debuggerTo write and debug grok patterns, you can use the{kibana-ref}/xpack-grokdebugger.html[Grok Debugger]. It provides a UI fortesting patterns against sample data. Under the covers, it uses the same engineas the `GROK` command.[[esql-grok-limitations]]===== Limitations// tag::grok-limitations[]The `GROK` command does not support configuring <<custom-patterns,custompatterns>>, or <<trace-match,multiple patterns>>. The `GROK` command is notsubject to <<grok-watchdog,Grok watchdog settings>>.// end::grok-limitations[]
 |