4 years ago · c4a33f2413
--- a/docs/reference/transform/examples.asciidoc
+++ b/docs/reference/transform/examples.asciidoc
@@ -6,24 +6,25 @@
 
				 <titleabbrev>Examples</titleabbrev>
			
 
				 ++++
			
 
				 
			
 
				-These examples demonstrate how to use {transforms} to derive useful 
			
 
				-insights from your data. All the examples use one of the 
			
 
				+These examples demonstrate how to use {transforms} to derive useful insights 
			
 
				+from your data. All the examples use one of the 
			
 
				 {kibana-ref}/add-sample-data.html[{kib} sample datasets]. For a more detailed, 
			
 
				-step-by-step example, see 
			
 
				-<<ecommerce-transforms>>.
			
 
				+step-by-step example, see <<ecommerce-transforms>>.
			
 
				 
			
 
				 * <<example-best-customers>>
			
 
				 * <<example-airline>>
			
 
				 * <<example-clientips>>
			
 
				+* <<example-last-log>>
			
 
				+
			
 
				 
			
 
				 [[example-best-customers]]
			
 
				 == Finding your best customers
			
 
				 
			
 
				-In this example, we use the eCommerce orders sample dataset to find the 
			
 
				-customers who spent the most in our hypothetical webshop. Let's transform the 
			
 
				-data such that the destination index contains the number of orders, the total 
			
 
				-price of the orders, the amount of unique products and the average price per 
			
 
				-order, and the total amount of ordered products for each customer.
			
 
				+This example uses the eCommerce orders sample data set to find the customers who 
			
 
				+spent the most in a hypothetical webshop. Let's transform the data such that the 
			
 
				+destination index contains the number of orders, the total price of the orders, 
			
 
				+the amount of unique products and the average price per order, and the total 
			
 
				+amount of ordered products for each customer.
			
 
				 
			
 
				 [source,console]
			
 
				 ----------------------------------
			
@@ -52,18 +53,17 @@ POST _transform/_preview
 
				 ----------------------------------
			
 
				 // TEST[skip:setup kibana sample data]
			
 
				 
			
 
				-<1> This is the destination index for the {transform}. It is ignored by
			
 
				-`_preview`.
			
 
				-<2> Two `group_by` fields have been selected. This means the {transform} will
			
 
				-contain a unique row per `user` and `customer_id` combination. Within this
			
 
				-dataset both these fields are unique. By including both in the {transform} it
			
 
				-gives more context to the final results.
			
 
				+<1> The destination index for the {transform}. It is ignored by `_preview`.
			
 
				+<2> Two `group_by` fields is selected. This means the {transform} contains a 
			
 
				+unique row per `user` and `customer_id` combination. Within this data set, both 
			
 
				+these fields are unique. By including both in the {transform}, it gives more 
			
 
				+context to the final results.
			
 
				 
			
 
				-NOTE: In the example above, condensed JSON formatting has been used for easier 
			
 
				+NOTE: In the example above, condensed JSON formatting is used for easier 
			
 
				 readability of the pivot object.
			
 
				 
			
 
				-The preview {transforms} API enables you to see the layout of the
			
 
				-{transform} in advance, populated with some sample values. For example:
			
 
				+The preview {transforms} API enables you to see the layout of the {transform} in 
			
 
				+advance, populated with some sample values. For example:
			
 
				 
			
 
				 [source,js]
			
 
				 ----------------------------------
			
@@ -84,6 +84,7 @@ The preview {transforms} API enables you to see the layout of the
 
				 ----------------------------------
			
 
				 // NOTCONSOLE
			
 
				 
			
 
				+
			
 
				 This {transform} makes it easier to answer questions such as:
			
 
				 
			
 
				 * Which customers spend the most?
			
@@ -100,14 +101,15 @@ enables us to analyze data at scale and gives more flexibility to explore and
 
				 navigate data from a customer centric perspective. In some cases, it can even 
			
 
				 make creating visualizations much simpler.
			
 
				 
			
 
				+
			
 
				 [[example-airline]]
			
 
				 == Finding air carriers with the most delays
			
 
				 
			
 
				-In this example, we use the Flights sample dataset to find out which air carrier 
			
 
				-had the most delays. First, we filter the source data such that it excludes all 
			
 
				-the cancelled flights by using a query filter. Then we transform the data to 
			
 
				+This example uses the Flights sample data set to find out which air carrier 
			
 
				+had the most delays. First, filter the source data such that it excludes all 
			
 
				+the cancelled flights by using a query filter. Then transform the data to 
			
 
				 contain the distinct number of flights, the sum of delayed minutes, and the sum 
			
 
				-of the flight minutes by air carrier. Finally, we use a 
			
 
				+of the flight minutes by air carrier. Finally, use a 
			
 
				 <<search-aggregations-pipeline-bucket-script-aggregation,`bucket_script`>>
			
 
				 to determine what percentage of the flight time was actually delay.
			
 
				 
			
@@ -151,9 +153,8 @@ POST _transform/_preview
 
				 ----------------------------------
			
 
				 // TEST[skip:setup kibana sample data]
			
 
				 
			
 
				-<1> Filter the source data to select only flights that were not cancelled.
			
 
				-<2> This is the destination index for the {transform}. It is ignored by
			
 
				-`_preview`.
			
 
				+<1> Filter the source data to select only flights that are not cancelled.
			
 
				+<2> The destination index for the {transform}. It is ignored by `_preview`.
			
 
				 <3> The data is grouped by the `Carrier` field which contains the airline name.
			
 
				 <4> This `bucket_script` performs calculations on the results that are returned 
			
 
				 by the aggregation. In this particular example, it calculates what percentage of 
			
@@ -183,20 +184,20 @@ This {transform} makes it easier to answer questions such as:
 
				 
			
 
				 * Which air carrier has the most delays as a percentage of flight time?
			
 
				 
			
 
				-NOTE: This data is fictional and does not reflect actual delays 
			
 
				-or flight stats for any of the featured destination or origin airports.
			
 
				+NOTE: This data is fictional and does not reflect actual delays or flight stats 
			
 
				+for any of the featured destination or origin airports.
			
 
				 
			
 
				 
			
 
				 [[example-clientips]]
			
 
				 == Finding suspicious client IPs
			
 
				 
			
 
				-In this example, we use the web log sample dataset to identify suspicious client 
			
 
				-IPs. We transform the data such that the new index contains the sum of bytes and 
			
 
				-the number of distinct URLs, agents, incoming requests by location, and 
			
 
				-geographic destinations for each client IP. We also use filter aggregations to 
			
 
				-count the specific types of HTTP responses that each client IP receives. 
			
 
				-Ultimately, the example below transforms web log data into an entity centric 
			
 
				-index where the entity is `clientip`.
			
 
				+This example uses the web log sample data set to identify suspicious client IPs. 
			
 
				+It transform the data such that the new index contains the sum of bytes and the 
			
 
				+number of distinct URLs, agents, incoming requests by location, and geographic 
			
 
				+destinations for each client IP. It also uses filter aggregations to count the 
			
 
				+specific types of HTTP responses that each client IP receives. Ultimately, the 
			
 
				+example below transforms web log data into an entity centric index where the 
			
 
				+entity is `clientip`.
			
 
				 
			
 
				 [source,console]
			
 
				 ----------------------------------
			
@@ -254,9 +255,9 @@ PUT _transform/suspicious_client_ips
 
				 ----------------------------------
			
 
				 // TEST[skip:setup kibana sample data]
			
 
				 
			
 
				-<1> This is the destination index for the {transform}.
			
 
				-<2> Configures the {transform} to run continuously. It uses the `timestamp` field
			
 
				-to synchronize the source and destination indices. The worst case
			
 
				+<1> The destination index for the {transform}.
			
 
				+<2> Configures the {transform} to run continuously. It uses the `timestamp` 
			
 
				+field to synchronize the source and destination indices. The worst case 
			
 
				 ingestion delay is 60 seconds.
			
 
				 <3> The data is grouped by the `clientip` field.
			
 
				 <4> Filter aggregation that counts the occurrences of successful (`200`) 
			
@@ -336,3 +337,149 @@ This {transform} makes it easier to answer questions such as:
 
				 * Which client IPs have high error rates?
			
 
				 
			
 
				 * Which client IPs are interacting with a high number of destination countries?
			
 
				+
			
 
				+
			
 
				+[[example-last-log]]
			
 
				+== Finding the last log event for each IP address
			
 
				+
			
 
				+This example uses the web log sample data set to find the last log from an IP 
			
 
				+address. Let's use the `latest` type of {transform} in continuous mode. It 
			
 
				+copies the most recent document for each unique key from the source index to the destination index
			
 
				+and updates the destination index as new data comes into the source index. 
			
 
				+
			
 
				+Pick the `clientip` field as the unique key; the data is grouped by this field. 
			
 
				+Select `timestamp` as the date field that sorts the data chronologically. For 
			
 
				+continuous mode, specify a date field that is used to identify new documents, 
			
 
				+and an interval between checks for changes in the source index.
			
 
				+
			
 
				+ Let's assume that we're interested in retaining documents only for IP addresses that appeared recently in the log. You can define a retention policy and specify a date field that is used to calculate 
			
 
				+the age of a document. This example uses the same date field that is used to 
			
 
				+sort the data. Then set the maximum age of a document; documents that are older 
			
 
				+than the value you set will be removed from the destination index.
			
 
				+
			
 
				+This {transform} creates the destination index that contains the latest login 
			
 
				+date for each client IP. As the {transform} runs in continuous mode, the 
			
 
				+destination index will be updated as new data that comes into the source index. 
			
 
				+Finally, every document that is older than 30 days will be removed from the 
			
 
				+destination index due to the applied retention policy.
			
 
				+
			
 
				+[source,console]
			
 
				+----------------------------------
			
 
				+PUT _transform/last-log-from-clientip
			
 
				+{
			
 
				+  "source": {
			
 
				+    "index": [
			
 
				+      "kibana_sample_data_logs"
			
 
				+    ]
			
 
				+  },
			
 
				+  "latest": {
			
 
				+    "unique_key": [ <1>
			
 
				+      "clientip"
			
 
				+    ],
			
 
				+    "sort": "timestamp" <2>
			
 
				+  },
			
 
				+  "frequency": "1m", <3>
			
 
				+  "dest": {
			
 
				+    "index": "last-log-from-clientip"
			
 
				+  },
			
 
				+  "sync": { <4>
			
 
				+    "time": {
			
 
				+      "field": "timestamp",
			
 
				+      "delay": "60s"
			
 
				+    }
			
 
				+  },
			
 
				+  "retention_policy": { <5>
			
 
				+    "time": {
			
 
				+      "field": "timestamp",
			
 
				+      "max_age": "30d"
			
 
				+    }
			
 
				+  },
			
 
				+  "settings": {
			
 
				+    "max_page_search_size": 500
			
 
				+  }
			
 
				+}
			
 
				+
			
 
				+----------------------------------
			
 
				+// TEST[skip:setup kibana sample data]
			
 
				+
			
 
				+<1> Specifies the field for grouping the data.
			
 
				+<2> Specifies the date field that is used for sorting the data.
			
 
				+<3> Sets the interval for the {transform} to check for changes in the source 
			
 
				+index.
			
 
				+<4> Contains the time field and delay settings used to synchronize the source 
			
 
				+and destination indices.
			
 
				+<5> Specifies the retention policy for the transform. Documents that are older 
			
 
				+than the configured value will be removed from the destination index. 
			
 
				+
			
 
				+
			
 
				+After you create the {transform}, start it:
			
 
				+
			
 
				+[source,console]
			
 
				+----------------------------------
			
 
				+POST _transform/last-log-from-clientip/_start
			
 
				+----------------------------------
			
 
				+// TEST[skip:setup kibana sample data]
			
 
				+
			
 
				+
			
 
				+After the {transform} processes the data, search the destination index:
			
 
				+
			
 
				+[source,console]
			
 
				+----------------------------------
			
 
				+GET last-log-from-clientip/_search
			
 
				+----------------------------------
			
 
				+// TEST[skip:setup kibana sample data]
			
 
				+
			
 
				+
			
 
				+The search result shows you data like this for each client IP:
			
 
				+
			
 
				+[source,js]
			
 
				+----------------------------------
			
 
				+{
			
 
				+  "_index" : "last-log-from-clientip",
			
 
				+  "_id" : "MOeHH_cUL5urmartKj-b5UQAAAAAAAAA",
			
 
				+  "_score" : 1.0,
			
 
				+  "_source" : {
			
 
				+    "referer" : "http://twitter.com/error/don-lind",
			
 
				+    "request" : "/elasticsearch",
			
 
				+    "agent" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
			
 
				+    "extension" : "",
			
 
				+    "memory" : null,
			
 
				+    "ip" : "0.72.176.46",
			
 
				+    "index" : "kibana_sample_data_logs",
			
 
				+    "message" : "0.72.176.46 - - [2018-09-18T06:31:00.572Z] \"GET /elasticsearch HTTP/1.1\" 200 7065 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\"",
			
 
				+    "url" : "https://www.elastic.co/downloads/elasticsearch",
			
 
				+    "tags" : [
			
 
				+      "success",
			
 
				+      "info"
			
 
				+    ],
			
 
				+    "geo" : {
			
 
				+      "srcdest" : "IN:PH",
			
 
				+      "src" : "IN",
			
 
				+      "coordinates" : {
			
 
				+        "lon" : -124.1127917,
			
 
				+        "lat" : 40.80338889
			
 
				+      },
			
 
				+      "dest" : "PH"
			
 
				+    },
			
 
				+    "utc_time" : "2021-05-04T06:31:00.572Z",
			
 
				+    "bytes" : 7065,
			
 
				+    "machine" : {
			
 
				+      "os" : "ios",
			
 
				+      "ram" : 12884901888
			
 
				+    },
			
 
				+    "response" : 200,
			
 
				+    "clientip" : "0.72.176.46",
			
 
				+    "host" : "www.elastic.co",
			
 
				+    "event" : {
			
 
				+      "dataset" : "sample_web_logs"
			
 
				+    },
			
 
				+    "phpmemory" : null,
			
 
				+    "timestamp" : "2021-05-04T06:31:00.572Z"
			
 
				+  }
			
 
				+}
			
 
				+----------------------------------
			
 
				+// NOTCONSOLE
			
 
				+
			
 
				+This {transform} makes it easier to answer questions such as:
			
 
				+
			
 
				+* What was the most recent log event associated with a specific IP address?