| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301 | [[dissect]]=== Dissecting dataDissect matches a single text field against a defined pattern. A dissectpattern is defined by the parts of the string you want to discard. Payingspecial attention to each part of a string helps to build successful dissectpatterns.If you don't need the power of regular expressions, use dissect patterns insteadof grok. Dissect uses a much simpler syntax than grok and is typically fasteroverall. The syntax for dissect is transparent: tell dissect what you want andit will return those results to you.[[dissect-syntax]]==== Dissect patternsDissect patterns are comprised of _variables_ and _separators_. Anythingdefined by a percent sign and curly braces `%{}` is considered a variable, such as `%{clientip}`. You can assign variables to any part of data in a field, and then return only the parts that you want. Separators are any values betweenvariables, which could be spaces, dashes, or other delimiters.For example, let's say you have log data with a `message` field that looks likethis:[source,js]----"message" : "247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"----// NOTCONSOLEYou assign variables to each part of the data to construct a successfuldissect pattern. Remember, tell dissect _exactly_ what you want you want tomatch on.The first part of the data looks like an IP address, so youcan assign a variable like `%{clientip}`. The next two characters are dasheswith a space on either side. You can assign a variable for each dash, or asingle variable to represent the dashes and spaces. Next are a set of bracketscontaining a timestamp. The brackets are a separator, so you include those inthe dissect pattern. Thus far, the data and matching dissect pattern look likethis:[source,js]----247.37.0.0 - - [30/Apr/2020:14:31:22 -0500]  <1>%{clientip} %{ident} %{auth} [%{@timestamp}] <2>----// NOTCONSOLE<1> The first chunks of data from the `message` field<2> Dissect pattern to match on the selected data chunksUsing that same logic, you can create variables for the remaining chunks ofdata. Double quotation marks are separators, so include those in your dissectpattern. The pattern replaces `GET` with a `%{verb}` variable, but keeps `HTTP`as part of the pattern. [source,js]----\"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}----// NOTCONSOLECombining the two patterns results in a dissect pattern that looks like this: [source,js]----%{clientip} %{ident} %{auth} [%{@timestamp}] \"%{verb} %{request} HTTP/%{httpversion}\" %{status} %{size}----// NOTCONSOLENow that you have a dissect pattern, how do you test and use it?[[dissect-patterns-test]]==== Test dissect patterns with PainlessYou can incorporate dissect patterns into Painless scripts to extractdata. To test your script, use either the {painless}/painless-execute-api.html#painless-execute-runtime-field-context[field contexts] of the Painlessexecute API or create a runtime field that includes the script. Runtime fieldsoffer greater flexibility and accept multiple documents, but the Painless executeAPI is a great option if you don't have write access on a cluster where you'retesting a script.For example, test your dissect pattern with the Painless execute API byincluding your Painless script and a single document that matches your data.Start by indexing the `message` field as a `wildcard` data type:[source,console]----PUT my-index{  "mappings": {    "properties": {      "message": {        "type": "wildcard"      }    }  }}----If you want to retrieve the HTTP response code, add your dissect pattern to aPainless script that extracts the `response` value. To extract values from afield, use this function:[source,painless]----`.extract(doc["<field_name>"].value)?.<field_value>`----In this example, `message` is the `<field_name>` and `response` is the`<field_value>`:[source,console]----POST /_scripts/painless/_execute{  "script": {    "source": """      String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;        if (response != null) emit(Integer.parseInt(response)); <1>    """  },  "context": "long_field", <2>  "context_setup": {    "index": "my-index",    "document": {          <3>      "message": """247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] "GET /images/hm_nbg.jpg HTTP/1.0" 304 0"""    }  }}----// TEST[continued]<1> Runtime fields require the `emit` method to return values.<2> Because the response code is an integer, use the `long_field` context.<3> Include a sample document that matches your data.The result includes the HTTP response code:[source,console-result]----{  "result" : [    304  ]}----[[dissect-patterns-runtime]]==== Use dissect patterns and scripts in runtime fieldsIf you have a functional dissect pattern, you can add it to a runtime field tomanipulate data. Because runtime fields don't require you to index fields, youhave incredible flexibility to modify your script and how it functions. If youalready <<dissect-patterns-test,tested your dissect pattern>> using the Painlessexecute API, you can use that _exact_ Painless script in your runtime field. To start, add the `message` field as a `wildcard` type like in the previoussection, but also add `@timestamp` as a `date` in case you want to operate onthat field for <<common-script-uses,other use cases>>:[source,console]----PUT /my-index/{  "mappings": {    "properties": {      "@timestamp": {        "format": "strict_date_optional_time||epoch_second",        "type": "date"      },      "message": {        "type": "wildcard"      }    }  }}----If you want to extract the HTTP response code using your dissect pattern, youcan create a runtime field like `http.response`:[source,console]----PUT my-index/_mappings{  "runtime": {    "http.response": {      "type": "long",      "script": """        String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;        if (response != null) emit(Integer.parseInt(response));      """    }  }}----// TEST[continued]After mapping the fields you want to retrieve, index a few records fromyour log data into {es}. The following request uses the <<docs-bulk,bulk API>>to index raw log data into `my-index`:[source,console]----POST /my-index/_bulk?refresh=true{"index":{}}{"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:30:53-05:00","message":"232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:31:12-05:00","message":"26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:31:19-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"}{"index":{}}{"timestamp":"2020-04-30T14:31:22-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"}{"index":{}}{"timestamp":"2020-04-30T14:31:27-05:00","message":"252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}{"index":{}}{"timestamp":"2020-04-30T14:31:28-05:00","message":"not a valid apache log"}----// TEST[continued]You can define a simple query to run a search for a specific HTTP response andreturn all related fields. Use the `fields` parameter of the search API toretrieve the `http.response` runtime field.[source,console]----GET my-index/_search{  "query": {    "match": {      "http.response": "304"    }  },  "fields" : ["http.response"]}----// TEST[continued]Alternatively, you can define the same runtime field but in the context of asearch request. The runtime definition and the script are exactly the same asthe one defined previously in the index mapping. Just copy that definition intothe search request under the `runtime_mappings` section and include a querythat matches on the runtime field. This query returns the same results as thesearch query previously defined for the `http.response` runtime field in yourindex mappings, but only in the context of this specific search:[source,console]----GET my-index/_search{  "runtime_mappings": {    "http.response": {      "type": "long",      "script": """        String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;        if (response != null) emit(Integer.parseInt(response));      """    }  },  "query": {    "match": {      "http.response": "304"    }  },  "fields" : ["http.response"]}----// TEST[continued]// TEST[s/_search/_search\?filter_path=hits/][source,console-result]----{  "hits" : {    "total" : {      "value" : 1,      "relation" : "eq"    },    "max_score" : 1.0,    "hits" : [      {        "_index" : "my-index",        "_id" : "D47UqXkBByC8cgZrkbOm",        "_score" : 1.0,        "_source" : {          "timestamp" : "2020-04-30T14:31:22-05:00",          "message" : "247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"        },        "fields" : {          "http.response" : [            304          ]        }      }    ]  }}----// TESTRESPONSE[s/"_id" : "D47UqXkBByC8cgZrkbOm"/"_id": $body.hits.hits.0._id/]
 |