123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323 |
- [[grok-processor]]
- === Grok Processor
- Extracts structured fields out of a single text field within a document. You choose which field to
- extract matched fields from, as well as the grok pattern you expect will match. A grok pattern is like a regular
- expression that supports aliased expressions that can be reused.
- This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format
- that is generally written for humans and not computer consumption.
- This processor comes packaged with many
- https://github.com/elastic/elasticsearch/blob/{branch}/libs/grok/src/main/resources/patterns[reusable patterns].
- If you need help building patterns to match your logs, you will find the {kibana-ref}/xpack-grokdebugger.html[Grok Debugger] tool quite useful! The Grok Debugger is an {xpack} feature under the Basic License and is therefore *free to use*. The Grok Constructor at <http://grokconstructor.appspot.com/> is also a useful tool.
- [[grok-basics]]
- ==== Grok Basics
- Grok sits on top of regular expressions, so any regular expressions are valid in grok as well.
- The regular expression library is Oniguruma, and you can see the full supported regexp syntax
- https://github.com/kkos/oniguruma/blob/master/doc/RE[on the Onigiruma site].
- Grok works by leveraging this regular expression language to allow naming existing patterns and combining them into more
- complex patterns that match your fields.
- The syntax for reusing a grok pattern comes in three forms: `%{SYNTAX:SEMANTIC}`, `%{SYNTAX}`, `%{SYNTAX:SEMANTIC:TYPE}`.
- The `SYNTAX` is the name of the pattern that will match your text. For example, `3.44` will be matched by the `NUMBER`
- pattern and `55.3.244.1` will be matched by the `IP` pattern. The syntax is how you match. `NUMBER` and `IP` are both
- patterns that are provided within the default patterns set.
- The `SEMANTIC` is the identifier you give to the piece of text being matched. For example, `3.44` could be the
- duration of an event, so you could call it simply `duration`. Further, a string `55.3.244.1` might identify
- the `client` making a request.
- The `TYPE` is the type you wish to cast your named field. `int`, `long`, `double`, `float` and `boolean` are supported types for coercion.
- For example, you might want to match the following text:
- [source,txt]
- --------------------------------------------------
- 3.44 55.3.244.1
- --------------------------------------------------
- You may know that the message in the example is a number followed by an IP address. You can match this text by using the following
- Grok expression.
- [source,txt]
- --------------------------------------------------
- %{NUMBER:duration} %{IP:client}
- --------------------------------------------------
- [[using-grok]]
- ==== Using the Grok Processor in a Pipeline
- [[grok-options]]
- .Grok Options
- [options="header"]
- |======
- | Name | Required | Default | Description
- | `field` | yes | - | The field to use for grok expression parsing
- | `patterns` | yes | - | An ordered list of grok expression to match and extract named captures with. Returns on the first expression in the list that matches.
- | `pattern_definitions` | no | - | A map of pattern-name and pattern tuples defining custom patterns to be used by the current processor. Patterns matching existing names will override the pre-existing definition.
- | `trace_match` | no | false | when true, `_ingest._grok_match_index` will be inserted into your matched document's metadata with the index into the pattern found in `patterns` that matched.
- | `ignore_missing` | no | false | If `true` and `field` does not exist or is `null`, the processor quietly exits without modifying the document
- include::common-options.asciidoc[]
- |======
- Here is an example of using the provided patterns to extract out and name structured fields from a string field in
- a document.
- [source,js]
- --------------------------------------------------
- {
- "message": "55.3.244.1 GET /index.html 15824 0.043"
- }
- --------------------------------------------------
- // NOTCONSOLE
- The pattern for this could be:
- [source,txt]
- --------------------------------------------------
- %{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}
- --------------------------------------------------
- Here is an example pipeline for processing the above document by using Grok:
- [source,js]
- --------------------------------------------------
- {
- "description" : "...",
- "processors": [
- {
- "grok": {
- "field": "message",
- "patterns": ["%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"]
- }
- }
- ]
- }
- --------------------------------------------------
- // NOTCONSOLE
- This pipeline will insert these named captures as new fields within the document, like so:
- [source,js]
- --------------------------------------------------
- {
- "message": "55.3.244.1 GET /index.html 15824 0.043",
- "client": "55.3.244.1",
- "method": "GET",
- "request": "/index.html",
- "bytes": 15824,
- "duration": "0.043"
- }
- --------------------------------------------------
- // NOTCONSOLE
- [[custom-patterns]]
- ==== Custom Patterns
- The Grok processor comes pre-packaged with a base set of pattern. These patterns may not always have
- what you are looking for. Pattern have a very basic format. Each entry describes has a name and the pattern itself.
- You can add your own patterns to a processor definition under the `pattern_definitions` option.
- Here is an example of a pipeline specifying custom pattern definitions:
- [source,js]
- --------------------------------------------------
- {
- "description" : "...",
- "processors": [
- {
- "grok": {
- "field": "message",
- "patterns": ["my %{FAVORITE_DOG:dog} is colored %{RGB:color}"],
- "pattern_definitions" : {
- "FAVORITE_DOG" : "beagle",
- "RGB" : "RED|GREEN|BLUE"
- }
- }
- }
- ]
- }
- --------------------------------------------------
- // NOTCONSOLE
- [[trace-match]]
- ==== Providing Multiple Match Patterns
- Sometimes one pattern is not enough to capture the potential structure of a field. Let's assume we
- want to match all messages that contain your favorite pet breeds of either cats or dogs. One way to accomplish
- this is to provide two distinct patterns that can be matched, instead of one really complicated expression capturing
- the same `or` behavior.
- Here is an example of such a configuration executed against the simulate API:
- [source,js]
- --------------------------------------------------
- POST _ingest/pipeline/_simulate
- {
- "pipeline": {
- "description" : "parse multiple patterns",
- "processors": [
- {
- "grok": {
- "field": "message",
- "patterns": ["%{FAVORITE_DOG:pet}", "%{FAVORITE_CAT:pet}"],
- "pattern_definitions" : {
- "FAVORITE_DOG" : "beagle",
- "FAVORITE_CAT" : "burmese"
- }
- }
- }
- ]
- },
- "docs":[
- {
- "_source": {
- "message": "I love burmese cats!"
- }
- }
- ]
- }
- --------------------------------------------------
- // CONSOLE
- response:
- [source,js]
- --------------------------------------------------
- {
- "docs": [
- {
- "doc": {
- "_type": "_type",
- "_index": "_index",
- "_id": "_id",
- "_source": {
- "message": "I love burmese cats!",
- "pet": "burmese"
- },
- "_ingest": {
- "timestamp": "2016-11-08T19:43:03.850+0000"
- }
- }
- }
- ]
- }
- --------------------------------------------------
- // TESTRESPONSE[s/2016-11-08T19:43:03.850\+0000/$body.docs.0.doc._ingest.timestamp/]
- Both patterns will set the field `pet` with the appropriate match, but what if we want to trace which of our
- patterns matched and populated our fields? We can do this with the `trace_match` parameter. Here is the output of
- that same pipeline, but with `"trace_match": true` configured:
- ////
- Hidden setup for example:
- [source,js]
- --------------------------------------------------
- POST _ingest/pipeline/_simulate
- {
- "pipeline": {
- "description" : "parse multiple patterns",
- "processors": [
- {
- "grok": {
- "field": "message",
- "patterns": ["%{FAVORITE_DOG:pet}", "%{FAVORITE_CAT:pet}"],
- "trace_match": true,
- "pattern_definitions" : {
- "FAVORITE_DOG" : "beagle",
- "FAVORITE_CAT" : "burmese"
- }
- }
- }
- ]
- },
- "docs":[
- {
- "_source": {
- "message": "I love burmese cats!"
- }
- }
- ]
- }
- --------------------------------------------------
- // CONSOLE
- ////
- [source,js]
- --------------------------------------------------
- {
- "docs": [
- {
- "doc": {
- "_type": "_type",
- "_index": "_index",
- "_id": "_id",
- "_source": {
- "message": "I love burmese cats!",
- "pet": "burmese"
- },
- "_ingest": {
- "_grok_match_index": "1",
- "timestamp": "2016-11-08T19:43:03.850+0000"
- }
- }
- }
- ]
- }
- --------------------------------------------------
- // TESTRESPONSE[s/2016-11-08T19:43:03.850\+0000/$body.docs.0.doc._ingest.timestamp/]
- In the above response, you can see that the index of the pattern that matched was `"1"`. This is to say that it was the
- second (index starts at zero) pattern in `patterns` to match.
- This trace metadata enables debugging which of the patterns matched. This information is stored in the ingest
- metadata and will not be indexed.
- [[grok-processor-rest-get]]
- ==== Retrieving patterns from REST endpoint
- The Grok Processor comes packaged with its own REST endpoint for retrieving which patterns the processor is packaged with.
- [source,js]
- --------------------------------------------------
- GET _ingest/processor/grok
- --------------------------------------------------
- // CONSOLE
- The above request will return a response body containing a key-value representation of the built-in patterns dictionary.
- [source,js]
- --------------------------------------------------
- {
- "patterns" : {
- "BACULA_CAPACITY" : "%{INT}{1,3}(,%{INT}{3})*",
- "PATH" : "(?:%{UNIXPATH}|%{WINPATH})",
- ...
- }
- --------------------------------------------------
- // NOTCONSOLE
- This can be useful to reference as the built-in patterns change across versions.
- [[grok-watchdog]]
- ==== Grok watchdog
- Grok expressions that take too long to execute are interrupted and
- the grok processor then fails with an exception. The grok
- processor has a watchdog thread that determines when evaluation of
- a grok expression takes too long and is controlled by the following
- settings:
- [[grok-watchdog-options]]
- .Grok watchdog settings
- [options="header"]
- |======
- | Name | Default | Description
- | `ingest.grok.watchdog.interval` | 1s | How often to check whether there are grok evaluations that take longer than the maximum allowed execution time.
- | `ingest.grok.watchdog.max_execution_time` | 1s | The maximum allowed execution of a grok expression evaluation.
- |======
|