4 years ago · e64a029be4
--- a/docs/reference/scripting/common-script-uses.asciidoc
+++ b/docs/reference/scripting/common-script-uses.asciidoc
@@ -17,15 +17,10 @@ There are two options at your disposal:
 
				 * <<grok,Grok>> is a regular expression dialect that supports aliased
			
 
				 expressions that you can reuse. Because Grok sits on top of regular expressions
			
 
				 (regex), any regular expressions are valid in grok as well.
			
 
				-* <<dissect-processor,Dissect>> extracts structured fields out of text, using
			
 
				+* <<dissect,Dissect>> extracts structured fields out of text, using
			
 
				 delimiters to define the matching pattern. Unlike grok, dissect doesn't use regular
			
 
				 expressions.
			
 
				 
			
 
				-Regex is incredibly powerful but can be complicated. If you don't need the
			
 
				-power of regular expressions, use dissect patterns, which are simple and
			
 
				-often faster than grok patterns. Paying special attention to the parts of the string
			
 
				-you want to discard will help build successful dissect patterns.
			
 
				-
			
 
				 Let's start with a simple example by adding the `@timestamp` and `message`
			
 
				 fields to the `my-index` mapping as indexed fields. To remain flexible, use
			
 
				 `wildcard` as the field type for `message`:
			
--- a/docs/reference/scripting/dissect-syntax.asciidoc
+++ b/docs/reference/scripting/dissect-syntax.asciidoc
@@ -0,0 +1,301 @@
 
				+[[dissect]]
			
 
				+=== Dissecting data
			
 
				+Dissect matches a single text field against a defined pattern. A dissect
			
 
				+pattern is defined by the parts of the string you want to discard. Paying
			
 
				+special attention to each part of a string helps to build successful dissect
			
 
				+patterns.
			
 
				+
			
 
				+If you don't need the power of regular expressions, use dissect patterns instead
			
 
				+of grok. Dissect uses a much simpler syntax than grok and is typically faster
			
 
				+overall. The syntax for dissect is transparent: tell dissect what you want and
			
 
				+it will return those results to you.
			
 
				+
			
 
				+[[dissect-syntax]]
			
 
				+==== Dissect patterns
			
 
				+Dissect patterns are comprised of _variables_ and _separators_. Anything
			
 
				+defined by a percent sign and curly braces `%{}` is considered a variable, 
			
 
				+such as `%{clientip}`. You can assign variables to any part of data in a field, 
			
 
				+and then return only the parts that you want. Separators are any values between
			
 
				+variables, which could be spaces, dashes, or other delimiters.
			
 
				+
			
 
				+For example, let's say you have log data with a `message` field that looks like
			
 
				+this:
			
 
				+
			
 
				+[source,js]
			
 
				+----
			
 
				+"message" : "247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"
			
 
				+----
			
 
				+// NOTCONSOLE
			
 
				+
			
 
				+You assign variables to each part of the data to construct a successful
			
 
				+dissect pattern. Remember, tell dissect _exactly_ what you want you want to
			
 
				+match on.
			
 
				+
			
 
				+The first part of the data looks like an IP address, so you
			
 
				+can assign a variable like `%{clientip}`. The next two characters are dashes
			
 
				+with a space on either side. You can assign a variable for each dash, or a
			
 
				+single variable to represent the dashes and spaces. Next are a set of brackets
			
 
				+containing a timestamp. The brackets are a separator, so you include those in
			
 
				+the dissect pattern. Thus far, the data and matching dissect pattern look like
			
 
				+this:
			
 
				+
			
 
				+[source,js]
			
 
				+----
			
 
				+247.37.0.0 - - [30/Apr/2020:14:31:22 -0500]  <1>
			
 
				+
			
 
				+%{clientip} %{ident} %{auth} [%{@timestamp}] <2>
			
 
				+----
			
 
				+// NOTCONSOLE
			
 
				+<1> The first chunks of data from the `message` field
			
 
				+<2> Dissect pattern to match on the selected data chunks
			
 
				+
			
 
				+Using that same logic, you can create variables for the remaining chunks of
			
 
				+data. Double quotation marks are separators, so include those in your dissect
			
 
				+pattern. The pattern replaces `GET` with a `%{verb}` variable, but keeps `HTTP`
			
 
				+as part of the pattern. 
			
 
				+
			
 
				+[source,js]
			
 
				+----
			
 
				+\"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0
			
 
				+
			
 
				+"%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}
			
 
				+----
			
 
				+// NOTCONSOLE
			
 
				+
			
 
				+Combining the two patterns results in a dissect pattern that looks like this: 
			
 
				+
			
 
				+[source,js]
			
 
				+----
			
 
				+%{clientip} %{ident} %{auth} [%{@timestamp}] \"%{verb} %{request} HTTP/%{httpversion}\" %{status} %{size}
			
 
				+----
			
 
				+// NOTCONSOLE
			
 
				+
			
 
				+Now that you have a dissect pattern, how do you test and use it?
			
 
				+
			
 
				+[[dissect-patterns-test]]
			
 
				+==== Test dissect patterns with Painless
			
 
				+You can incorporate dissect patterns into Painless scripts to extract
			
 
				+data. To test your script, use either the {painless}/painless-execute-api.html#painless-execute-runtime-field-context[field contexts] of the Painless
			
 
				+execute API or create a runtime field that includes the script. Runtime fields
			
 
				+offer greater flexibility and accept multiple documents, but the Painless execute
			
 
				+API is a great option if you don't have write access on a cluster where you're
			
 
				+testing a script.
			
 
				+
			
 
				+For example, test your dissect pattern with the Painless execute API by
			
 
				+including your Painless script and a single document that matches your data.
			
 
				+Start by indexing the `message` field as a `wildcard` data type:
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+PUT my-index
			
 
				+{
			
 
				+  "mappings": {
			
 
				+    "properties": {
			
 
				+      "message": {
			
 
				+        "type": "wildcard"
			
 
				+      }
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+----
			
 
				+
			
 
				+If you want to retrieve the HTTP response code, add your dissect pattern to a
			
 
				+Painless script that extracts the `response` value. To extract values from a
			
 
				+field, use this function:
			
 
				+
			
 
				+[source,painless]
			
 
				+----
			
 
				+`.extract(doc["<field_name>"].value)?.<field_value>`
			
 
				+----
			
 
				+
			
 
				+In this example, `message` is the `<field_name>` and `response` is the
			
 
				+`<field_value>`:
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+POST /_scripts/painless/_execute
			
 
				+{
			
 
				+  "script": {
			
 
				+    "source": """
			
 
				+      String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;
			
 
				+        if (response != null) emit(Integer.parseInt(response)); <1>
			
 
				+    """
			
 
				+  },
			
 
				+  "context": "long_field", <2>
			
 
				+  "context_setup": {
			
 
				+    "index": "my-index",
			
 
				+    "document": {          <3>
			
 
				+      "message": """247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] "GET /images/hm_nbg.jpg HTTP/1.0" 304 0"""
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+----
			
 
				+// TEST[continued]
			
 
				+<1> Runtime fields require the `emit` method to return values.
			
 
				+<2> Because the response code is an integer, use the `long_field` context.
			
 
				+<3> Include a sample document that matches your data.
			
 
				+
			
 
				+The result includes the HTTP response code:
			
 
				+
			
 
				+[source,console-result]
			
 
				+----
			
 
				+{
			
 
				+  "result" : [
			
 
				+    304
			
 
				+  ]
			
 
				+}
			
 
				+----
			
 
				+
			
 
				+[[dissect-patterns-runtime]]
			
 
				+==== Use dissect patterns and scripts in runtime fields
			
 
				+If you have a functional dissect pattern, you can add it to a runtime field to
			
 
				+manipulate data. Because runtime fields don't require you to index fields, you
			
 
				+have incredible flexibility to modify your script and how it functions. If you
			
 
				+already <<dissect-patterns-test,tested your dissect pattern>> using the Painless
			
 
				+execute API, you can use that _exact_ Painless script in your runtime field. 
			
 
				+
			
 
				+To start, add the `message` field as a `wildcard` type like in the previous
			
 
				+section, but also add `@timestamp` as a `date` in case you want to operate on
			
 
				+that field for <<common-script-uses,other use cases>>:
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+PUT /my-index/
			
 
				+{
			
 
				+  "mappings": {
			
 
				+    "properties": {
			
 
				+      "@timestamp": {
			
 
				+        "format": "strict_date_optional_time||epoch_second",
			
 
				+        "type": "date"
			
 
				+      },
			
 
				+      "message": {
			
 
				+        "type": "wildcard"
			
 
				+      }
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+----
			
 
				+
			
 
				+If you want to extract the HTTP response code using your dissect pattern, you
			
 
				+can create a runtime field like `http.response`:
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+PUT my-index/_mappings
			
 
				+{
			
 
				+  "runtime": {
			
 
				+    "http.response": {
			
 
				+      "type": "long",
			
 
				+      "script": """
			
 
				+        String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;
			
 
				+        if (response != null) emit(Integer.parseInt(response));
			
 
				+      """
			
 
				+    }
			
 
				+  }
			
 
				+}
			
 
				+----
			
 
				+// TEST[continued]
			
 
				+
			
 
				+After mapping the fields you want to retrieve, index a few records from
			
 
				+your log data into {es}. The following request uses the <<docs-bulk,bulk API>>
			
 
				+to index raw log data into `my-index`:
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+POST /my-index/_bulk?refresh=true
			
 
				+{"index":{}}
			
 
				+{"timestamp":"2020-04-30T14:30:17-05:00","message":"40.135.0.0 - - [30/Apr/2020:14:30:17 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
			
 
				+{"index":{}}
			
 
				+{"timestamp":"2020-04-30T14:30:53-05:00","message":"232.0.0.0 - - [30/Apr/2020:14:30:53 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
			
 
				+{"index":{}}
			
 
				+{"timestamp":"2020-04-30T14:31:12-05:00","message":"26.1.0.0 - - [30/Apr/2020:14:31:12 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
			
 
				+{"index":{}}
			
 
				+{"timestamp":"2020-04-30T14:31:19-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:19 -0500] \"GET /french/splash_inet.html HTTP/1.0\" 200 3781"}
			
 
				+{"index":{}}
			
 
				+{"timestamp":"2020-04-30T14:31:22-05:00","message":"247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"}
			
 
				+{"index":{}}
			
 
				+{"timestamp":"2020-04-30T14:31:27-05:00","message":"252.0.0.0 - - [30/Apr/2020:14:31:27 -0500] \"GET /images/hm_bg.jpg HTTP/1.0\" 200 24736"}
			
 
				+{"index":{}}
			
 
				+{"timestamp":"2020-04-30T14:31:28-05:00","message":"not a valid apache log"}
			
 
				+----
			
 
				+// TEST[continued]
			
 
				+
			
 
				+You can define a simple query to run a search for a specific HTTP response and
			
 
				+return all related fields. Use the `fields` parameter of the search API to
			
 
				+retrieve the `http.response` runtime field.
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET my-index/_search
			
 
				+{
			
 
				+  "query": {
			
 
				+    "match": {
			
 
				+      "http.response": "304"
			
 
				+    }
			
 
				+  },
			
 
				+  "fields" : ["http.response"]
			
 
				+}
			
 
				+----
			
 
				+// TEST[continued]
			
 
				+
			
 
				+Alternatively, you can define the same runtime field but in the context of a
			
 
				+search request. The runtime definition and the script are exactly the same as
			
 
				+the one defined previously in the index mapping. Just copy that definition into
			
 
				+the search request under the `runtime_mappings` section and include a query
			
 
				+that matches on the runtime field. This query returns the same results as the
			
 
				+search query previously defined for the `http.response` runtime field in your
			
 
				+index mappings, but only in the context of this specific search:
			
 
				+
			
 
				+[source,console]
			
 
				+----
			
 
				+GET my-index/_search
			
 
				+{
			
 
				+  "runtime_mappings": {
			
 
				+    "http.response": {
			
 
				+      "type": "long",
			
 
				+      "script": """
			
 
				+        String response=dissect('%{clientip} %{ident} %{auth} [%{@timestamp}] "%{verb} %{request} HTTP/%{httpversion}" %{response} %{size}').extract(doc["message"].value)?.response;
			
 
				+        if (response != null) emit(Integer.parseInt(response));
			
 
				+      """
			
 
				+    }
			
 
				+  },
			
 
				+  "query": {
			
 
				+    "match": {
			
 
				+      "http.response": "304"
			
 
				+    }
			
 
				+  },
			
 
				+  "fields" : ["http.response"]
			
 
				+}
			
 
				+----
			
 
				+// TEST[continued]
			
 
				+// TEST[s/_search/_search\?filter_path=hits/]
			
 
				+
			
 
				+[source,console-result]
			
 
				+----
			
 
				+{
			
 
				+  "hits" : {
			
 
				+    "total" : {
			
 
				+      "value" : 1,
			
 
				+      "relation" : "eq"
			
 
				+    },
			
 
				+    "max_score" : 1.0,
			
 
				+    "hits" : [
			
 
				+      {
			
 
				+        "_index" : "my-index",
			
 
				+        "_id" : "D47UqXkBByC8cgZrkbOm",
			
 
				+        "_score" : 1.0,
			
 
				+        "_source" : {
			
 
				+          "timestamp" : "2020-04-30T14:31:22-05:00",
			
 
				+          "message" : "247.37.0.0 - - [30/Apr/2020:14:31:22 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0"
			
 
				+        },
			
 
				+        "fields" : {
			
 
				+          "http.response" : [
			
 
				+            304
			
 
				+          ]
			
 
				+        }
			
 
				+      }
			
 
				+    ]
			
 
				+  }
			
 
				+}
			
 
				+----
			
 
				+// TESTRESPONSE[s/"_id" : "D47UqXkBByC8cgZrkbOm"/"_id": $body.hits.hits.0._id/]
			
--- a/docs/reference/scripting/using.asciidoc
+++ b/docs/reference/scripting/using.asciidoc
@@ -566,4 +566,5 @@ DELETE /_ingest/pipeline/my_test_scores_pipeline
 
				 
			
 
				 ////
			
 
				 
			
 
				+include::dissect-syntax.asciidoc[]
			
 
				 include::grok-syntax.asciidoc[]