esql-process-data-with-dissect-grok.asciidoc 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327
  1. [[esql-process-data-with-dissect-and-grok]]
  2. === Data processing with DISSECT and GROK
  3. ++++
  4. <titleabbrev>Data processing with DISSECT and GROK</titleabbrev>
  5. ++++
  6. Your data may contain unstructured strings that you want to structure. This
  7. makes it easier to analyze the data. For example, log messages may contain IP
  8. addresses that you want to extract so you can find the most active IP addresses.
  9. image::images/esql/unstructured-data.png[align="center",width=75%]
  10. {es} can structure your data at index time or query time. At index time, you can
  11. use the <<dissect-processor,Dissect>> and <<grok-processor,Grok>> ingest
  12. processors, or the {ls} {logstash-ref}/plugins-filters-dissect.html[Dissect] and
  13. {logstash-ref}/plugins-filters-grok.html[Grok] filters. At query time, you can
  14. use the {esql} <<esql-dissect>> and <<esql-grok>> commands.
  15. [[esql-grok-or-dissect]]
  16. ==== `DISSECT` or `GROK`? Or both?
  17. `DISSECT` works by breaking up a string using a delimiter-based pattern. `GROK`
  18. works similarly, but uses regular expressions. This makes `GROK` more powerful,
  19. but generally also slower. `DISSECT` works well when data is reliably repeated.
  20. `GROK` is a better choice when you really need the power of regular expressions,
  21. for example when the structure of your text varies from row to row.
  22. You can use both `DISSECT` and `GROK` for hybrid use cases. For example when a
  23. section of the line is reliably repeated, but the entire line is not. `DISSECT`
  24. can deconstruct the section of the line that is repeated. `GROK` can process the
  25. remaining field values using regular expressions.
  26. [[esql-process-data-with-dissect]]
  27. ==== Process data with `DISSECT`
  28. The <<esql-dissect>> processing command matches a string against a
  29. delimiter-based pattern, and extracts the specified keys as columns.
  30. For example, the following pattern:
  31. [source,txt]
  32. ----
  33. %{clientip} [%{@timestamp}] %{status}
  34. ----
  35. matches a log line of this format:
  36. [source,txt]
  37. ----
  38. 1.2.3.4 [2023-01-23T12:15:00.000Z] Connected
  39. ----
  40. and results in adding the following columns to the input table:
  41. [%header.monospaced.styled,format=dsv,separator=|]
  42. |===
  43. clientip:keyword | @timestamp:keyword | status:keyword
  44. 1.2.3.4 | 2023-01-23T12:15:00.000Z | Connected
  45. |===
  46. [[esql-dissect-patterns]]
  47. ===== Dissect patterns
  48. include::../ingest/processors/dissect.asciidoc[tag=intro-example-explanation]
  49. A <<esql-named-skip-key,named skip key>> can be used to match values, but
  50. exclude the value from the output.
  51. // TODO: Change back to original text when https://github.com/elastic/elasticsearch/pull/102580 is merged
  52. All matched values are output as keyword string data types. Use the
  53. <<esql-type-conversion-functions>> to convert to another data type.
  54. Dissect also supports <<esql-dissect-key-modifiers,key modifiers>> that can
  55. change dissect's default behavior. For example, you can instruct dissect to
  56. ignore certain fields, append fields, skip over padding, etc.
  57. [[esql-dissect-terminology]]
  58. ===== Terminology
  59. dissect pattern::
  60. the set of fields and delimiters describing the textual
  61. format. Also known as a dissection.
  62. The dissection is described using a set of `%{}` sections:
  63. `%{a} - %{b} - %{c}`
  64. field::
  65. the text from `%{` to `}` inclusive.
  66. delimiter::
  67. the text between `}` and the next `%{` characters.
  68. Any set of characters other than `%{`, `'not }'`, or `}` is a delimiter.
  69. key::
  70. +
  71. --
  72. the text between the `%{` and `}`, exclusive of the `?`, `+`, `&` prefixes
  73. and the ordinal suffix.
  74. Examples:
  75. * `%{?aaa}` - the key is `aaa`
  76. * `%{+bbb/3}` - the key is `bbb`
  77. * `%{&ccc}` - the key is `ccc`
  78. --
  79. [[esql-dissect-examples]]
  80. ===== Examples
  81. include::processing-commands/dissect.asciidoc[tag=examples]
  82. [[esql-dissect-key-modifiers]]
  83. ===== Dissect key modifiers
  84. include::../ingest/processors/dissect.asciidoc[tag=dissect-key-modifiers]
  85. [[esql-dissect-key-modifiers-table]]
  86. .Dissect key modifiers
  87. [options="header",role="styled"]
  88. |======
  89. | Modifier | Name | Position | Example | Description | Details
  90. | `->` | Skip right padding | (far) right | `%{keyname1->}` | Skips any repeated characters to the right | <<esql-dissect-modifier-skip-right-padding,link>>
  91. | `+` | Append | left | `%{+keyname} %{+keyname}` | Appends two or more fields together | <<esql-append-modifier,link>>
  92. | `+` with `/n` | Append with order | left and right | `%{+keyname/2} %{+keyname/1}` | Appends two or more fields together in the order specified | <<esql-append-order-modifier,link>>
  93. | `?` | Named skip key | left | `%{?ignoreme}` | Skips the matched value in the output. Same behavior as `%{}`| <<esql-named-skip-key,link>>
  94. |======
  95. [[esql-dissect-modifier-skip-right-padding]]
  96. ====== Right padding modifier (`->`)
  97. include::../ingest/processors/dissect.asciidoc[tag=dissect-modifier-skip-right-padding]
  98. For example:
  99. [source.merge.styled,esql]
  100. ----
  101. include::{esql-specs}/docs.csv-spec[tag=dissectRightPaddingModifier]
  102. ----
  103. [%header.monospaced.styled,format=dsv,separator=|]
  104. |===
  105. include::{esql-specs}/docs.csv-spec[tag=dissectRightPaddingModifier-result]
  106. |===
  107. ////
  108. // TODO: Re-enable when https://github.com/elastic/elasticsearch/pull/102580 is merged
  109. include::../ingest/processors/dissect.asciidoc[tag=dissect-modifier-empty-right-padding]
  110. For example:
  111. [source.merge.styled,esql]
  112. ----
  113. include::{esql-specs}/docs.csv-spec[tag=dissectEmptyRightPaddingModifier]
  114. ----
  115. [%header.monospaced.styled,format=dsv,separator=|]
  116. |===
  117. include::{esql-specs}/docs.csv-spec[tag=dissectEmptyRightPaddingModifier-result]
  118. |===
  119. ////
  120. [[esql-append-modifier]]
  121. ====== Append modifier (`+`)
  122. include::../ingest/processors/dissect.asciidoc[tag=append-modifier]
  123. [source.merge.styled,esql]
  124. ----
  125. include::{esql-specs}/docs.csv-spec[tag=dissectAppendModifier]
  126. ----
  127. [%header.monospaced.styled,format=dsv,separator=|]
  128. |===
  129. include::{esql-specs}/docs.csv-spec[tag=dissectAppendModifier-result]
  130. |===
  131. [[esql-append-order-modifier]]
  132. ====== Append with order modifier (`+` and `/n`)
  133. include::../ingest/processors/dissect.asciidoc[tag=append-order-modifier]
  134. [source.merge.styled,esql]
  135. ----
  136. include::{esql-specs}/docs.csv-spec[tag=dissectAppendWithOrderModifier]
  137. ----
  138. [%header.monospaced.styled,format=dsv,separator=|]
  139. |===
  140. include::{esql-specs}/docs.csv-spec[tag=dissectAppendWithOrderModifier-result]
  141. |===
  142. [[esql-named-skip-key]]
  143. ====== Named skip key (`?`)
  144. // include::../ingest/processors/dissect.asciidoc[tag=named-skip-key]
  145. // TODO: Re-enable when https://github.com/elastic/elasticsearch/pull/102580 is merged
  146. Dissect supports ignoring matches in the final result. This can be done with a
  147. named skip key using the `{?name}` syntax:
  148. [source.merge.styled,esql]
  149. ----
  150. include::{esql-specs}/docs.csv-spec[tag=dissectNamedSkipKey]
  151. ----
  152. [%header.monospaced.styled,format=dsv,separator=|]
  153. |===
  154. include::{esql-specs}/docs.csv-spec[tag=dissectNamedSkipKey-result]
  155. |===
  156. [[esql-dissect-limitations]]
  157. ===== Limitations
  158. // tag::dissect-limitations[]
  159. The `DISSECT` command does not support reference keys and empty keys.
  160. // end::dissect-limitations[]
  161. [[esql-process-data-with-grok]]
  162. ==== Process data with `GROK`
  163. The <<esql-grok>> processing command matches a string against a pattern based on
  164. regular expressions, and extracts the specified keys as columns.
  165. For example, the following pattern:
  166. [source,txt]
  167. ----
  168. %{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}
  169. ----
  170. matches a log line of this format:
  171. [source,txt]
  172. ----
  173. 1.2.3.4 [2023-01-23T12:15:00.000Z] Connected
  174. ----
  175. Putting it together as an {esql} query:
  176. [source.merge.styled,esql]
  177. ----
  178. include::{esql-specs}/docs.csv-spec[tag=grokWithEscape]
  179. ----
  180. `GROK` adds the following columns to the input table:
  181. [%header.monospaced.styled,format=dsv,separator=|]
  182. |===
  183. @timestamp:keyword | ip:keyword | status:keyword
  184. 2023-01-23T12:15:00.000Z | 1.2.3.4 | Connected
  185. |===
  186. [NOTE]
  187. ====
  188. Special regex characters in grok patterns, like `[` and `]` need to be escaped
  189. with a `\`. For example, in the earlier pattern:
  190. [source,txt]
  191. ----
  192. %{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}
  193. ----
  194. In {esql} queries, the backslash character itself is a special character that
  195. needs to be escaped with another `\`. For this example, the corresponding {esql}
  196. query becomes:
  197. [source.merge.styled,esql]
  198. ----
  199. include::{esql-specs}/docs.csv-spec[tag=grokWithEscape]
  200. ----
  201. ====
  202. [[esql-grok-patterns]]
  203. ===== Grok patterns
  204. The syntax for a grok pattern is `%{SYNTAX:SEMANTIC}`
  205. The `SYNTAX` is the name of the pattern that matches your text. For example,
  206. `3.44` is matched by the `NUMBER` pattern and `55.3.244.1` is matched by the
  207. `IP` pattern. The syntax is how you match.
  208. The `SEMANTIC` is the identifier you give to the piece of text being matched.
  209. For example, `3.44` could be the duration of an event, so you could call it
  210. simply `duration`. Further, a string `55.3.244.1` might identify the `client`
  211. making a request.
  212. By default, matched values are output as keyword string data types. To convert a
  213. semantic's data type, suffix it with the target data type. For example
  214. `%{NUMBER:num:int}`, which converts the `num` semantic from a string to an
  215. integer. Currently the only supported conversions are `int` and `float`. For
  216. other types, use the <<esql-type-conversion-functions>>.
  217. For an overview of the available patterns, refer to
  218. {es-repo}/blob/{branch}/libs/grok/src/main/resources/patterns[GitHub]. You can
  219. also retrieve a list of all patterns using a <<grok-processor-rest-get,REST
  220. API>>.
  221. [[esql-grok-regex]]
  222. ===== Regular expressions
  223. Grok is based on regular expressions. Any regular expressions are valid in grok
  224. as well. Grok uses the Oniguruma regular expression library. Refer to
  225. https://github.com/kkos/oniguruma/blob/master/doc/RE[the Oniguruma GitHub
  226. repository] for the full supported regexp syntax.
  227. [[esql-custom-patterns]]
  228. ===== Custom patterns
  229. If grok doesn't have a pattern you need, you can use the Oniguruma syntax for
  230. named capture which lets you match a piece of text and save it as a column:
  231. [source,txt]
  232. ----
  233. (?<field_name>the pattern here)
  234. ----
  235. For example, postfix logs have a `queue id` that is a 10 or 11-character
  236. hexadecimal value. This can be captured to a column named `queue_id` with:
  237. [source,txt]
  238. ----
  239. (?<queue_id>[0-9A-F]{10,11})
  240. ----
  241. [[esql-grok-examples]]
  242. ===== Examples
  243. include::processing-commands/grok.asciidoc[tag=examples]
  244. [[esql-grok-debugger]]
  245. ===== Grok debugger
  246. To write and debug grok patterns, you can use the
  247. {kibana-ref}/xpack-grokdebugger.html[Grok Debugger]. It provides a UI for
  248. testing patterns against sample data. Under the covers, it uses the same engine
  249. as the `GROK` command.
  250. [[esql-grok-limitations]]
  251. ===== Limitations
  252. // tag::grok-limitations[]
  253. The `GROK` command does not support configuring <<custom-patterns,custom
  254. patterns>>, or <<trace-match,multiple patterns>>. The `GROK` command is not
  255. subject to <<grok-watchdog,Grok watchdog settings>>.
  256. // end::grok-limitations[]