| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251 | [[grok]]=== Grokking grokGrok is a regular expression dialect that supports reusable aliased expressions. Grok works really well with syslog logs, Apache and other webserverlogs, mysql logs, and generally any log format that is written for humans andnot computer consumption.Grok sits on top of the https://github.com/kkos/oniguruma/blob/master/doc/RE[Oniguruma] regular expression library, so any regular expressions arevalid in grok. Grok uses this regular expression language to allow namingexisting patterns and combining them into more complex patterns that match yourfields.[[grok-syntax]]==== Grok patternsThe {stack} ships with numerous https://github.com/elastic/elasticsearch/blob/master/libs/grok/src/main/resources/patterns/legacy/grok-patterns[predefined grok patterns] that simplify working with grok. The syntax for reusing grok patternstakes one of the following forms:[%autowidth]|===|`%{SYNTAX}` | `%{SYNTAX:ID}` |`%{SYNTAX:ID:TYPE}`|===`SYNTAX`::The name of the pattern that will match your text. For example, `NUMBER` and`IP` are both patterns that are provided within the default patterns set. The`NUMBER` pattern matches data like `3.44`, and the `IP` pattern matches datalike `55.3.244.1`. `ID`::The identifier you give to the piece of text being matched. For  example, `3.44`could be the duration of an event, so you might call it `duration`. The string`55.3.244.1` might identify the `client` making a request.`TYPE`::The data type you want to cast your named field. `int`, `long`, `double`,`float` and `boolean` are supported types.For example, let's say you have message data that looks like this:[source,txt]----3.44 55.3.244.1----The first value is a number, followed by what appears to be an IP address. Youcan match this text by using the following grok expression:[source,txt]----%{NUMBER:duration} %{IP:client}----[[grok-ecs]]==== Migrating to Elastic Common Schema (ECS)To ease migration to the {ecs-ref}[Elastic Common Schema (ECS)], a new set of ECS-compliant patterns is available in addition to the existing patterns. Thenew ECS pattern definitions capture event field names that are compliant withthe schema.The ECS pattern set has all of the pattern definitions from the legacy set, andis a drop-in replacement. Use the {logstash-ref}/plugins-filters-grok.html#plugins-filters-grok-ecs_compatibility[`ecs-compatability`]setting to switch modes. New features and enhancements will be added to the ECS-compliant files. Thelegacy patterns may still receive bug fixes which are backwards compatible.[[grok-patterns]]==== Use grok patterns in Painless scriptsYou can incorporate predefined grok patterns into Painless scripts to extractdata. To test your script, use either the {painless}/painless-execute-api.html#painless-execute-runtime-field-context[field contexts] of the Painlessexecute API or create a runtime field that includes the script. Runtime fieldsoffer greater flexibility and accept multiple documents, but the Painlessexecute API is a great option if you don't have write access on a clusterwhere you're testing a script.TIP: If you need help building grok patterns to match your data, use the {kibana-ref}/xpack-grokdebugger.html[Grok Debugger] tool in {kib}.For example, if you're working with Apache log data, you can use the`%{COMMONAPACHELOG}` syntax, which understands the structure of Apache logs. Asample document might look like this:// Note to contributors that the line break in the following example is// intentional to promote better readability in the output[source,js]----"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"----// NOTCONSOLETo extract the IP address from the `message` field, you can write a Painlessscript that incorporates the `%{COMMONAPACHELOG}` syntax. You can test thisscript using the {painless}/painless-execute-api.html#painless-runtime-ip[`ip` field context] of the Painless execute API, but let's use a runtime fieldinstead.Based on the sample document, index the `@timestamp` and `message` fields. To remain flexible, use `wildcard` as the field type for `message`:[source,console]----PUT /my-index/{  "mappings": {    "properties": {      "@timestamp": {        "format": "strict_date_optional_time||epoch_second",        "type": "date"      },      "message": {        "type": "wildcard"      }    }  }}----Next, use the <<docs-bulk,bulk API>> to index some log data into`my-index`.[source,console]----POST /my-index/_bulk?refresh{"index":{}}{"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:30:53-05:00","message":"232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:31:12-05:00","message":"26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:31:19-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"}{"index":{}}{"timestamp":"2020-04-30T14:31:22-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"}{"index":{}}{"timestamp":"2020-04-30T14:31:27-05:00","message":"252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:31:28-05:00","message":"not a valid apache log"}----// TEST[continued][[grok-patterns-runtime]]==== Incorporate grok patterns and scripts in runtime fieldsNow you can define a runtime field in the mappings that includes your Painlessscript and grok pattern. If the pattern matches, the script emits the value ofthe matching IP address. If the pattern doesn't match (`clientip != null`), thescript just returns the field value without crashing.[source,console]----PUT my-index/_mappings{  "runtime": {    "http.clientip": {      "type": "ip",      "script": """        String clientip=grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.clientip;        if (clientip != null) emit(clientip);      """    }  }}----// TEST[continued]Alternatively, you can define the same runtime field but in the context of asearch request. The runtime definition and the script are exactly the same asthe one defined previously in the index mapping. Just copy that definition intothe search request under the `runtime_mappings` section and include a querythat matches on the runtime field. This query returns the same results as ifyou <<grok-pattern-results,defined a search query>> for the `http.clientip`runtime field in your index mappings, but only in the context of this specificsearch:[source,console]----GET my-index/_search{  "runtime_mappings": {    "http.clientip": {      "type": "ip",      "script": """        String clientip=grok('%{COMMONAPACHELOG}').extract(doc["message"].value)?.clientip;        if (clientip != null) emit(clientip);      """    }  },  "query": {    "match": {      "http.clientip": "40.135.0.0"    }  },  "fields" : ["http.clientip"]}----// TEST[continued][[grok-pattern-results]]==== Return calculated resultsUsing the `http.clientip` runtime field, you can define a simple query to run asearch for a specific IP address and return all related fields. The <<search-fields,`fields`>> parameter on the `_search` API works for all fields,even those that weren't sent as part of the original `_source`:[source,console]----GET my-index/_search{  "query": {    "match": {      "http.clientip": "40.135.0.0"    }  },  "fields" : ["http.clientip"]}----// TEST[continued]// TEST[s/_search/_search\?filter_path=hits/]The response includes the specific IP address indicated in your search query.The grok pattern within the Painless script extracted this value from the`message` field at runtime. [source,console-result]----{  "hits" : {    "total" : {      "value" : 1,      "relation" : "eq"    },    "max_score" : 1.0,    "hits" : [      {        "_index" : "my-index",        "_id" : "1iN2a3kBw4xTzEDqyYE0",        "_score" : 1.0,        "_source" : {          "timestamp" : "2020-04-30T14:30:17-05:00",          "message" : "40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"        },        "fields" : {          "http.clientip" : [            "40.135.0.0"          ]        }      }    ]  }}----// TESTRESPONSE[s/"_id" : "1iN2a3kBw4xTzEDqyYE0"/"_id": $body.hits.hits.0._id/]
 |