Browse Source

[DOCS] Adds latest transform example to transform docs (#70562)

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
István Zoltán Szabó 4 years ago
parent
commit
c4a33f2413
1 changed files with 184 additions and 37 deletions
  1. 184 37
      docs/reference/transform/examples.asciidoc

+ 184 - 37
docs/reference/transform/examples.asciidoc

@@ -6,24 +6,25 @@
 <titleabbrev>Examples</titleabbrev>
 ++++
 
-These examples demonstrate how to use {transforms} to derive useful 
-insights from your data. All the examples use one of the 
+These examples demonstrate how to use {transforms} to derive useful insights 
+from your data. All the examples use one of the 
 {kibana-ref}/add-sample-data.html[{kib} sample datasets]. For a more detailed, 
-step-by-step example, see 
-<<ecommerce-transforms>>.
+step-by-step example, see <<ecommerce-transforms>>.
 
 * <<example-best-customers>>
 * <<example-airline>>
 * <<example-clientips>>
+* <<example-last-log>>
+
 
 [[example-best-customers]]
 == Finding your best customers
 
-In this example, we use the eCommerce orders sample dataset to find the 
-customers who spent the most in our hypothetical webshop. Let's transform the 
-data such that the destination index contains the number of orders, the total 
-price of the orders, the amount of unique products and the average price per 
-order, and the total amount of ordered products for each customer.
+This example uses the eCommerce orders sample data set to find the customers who 
+spent the most in a hypothetical webshop. Let's transform the data such that the 
+destination index contains the number of orders, the total price of the orders, 
+the amount of unique products and the average price per order, and the total 
+amount of ordered products for each customer.
 
 [source,console]
 ----------------------------------
@@ -52,18 +53,17 @@ POST _transform/_preview
 ----------------------------------
 // TEST[skip:setup kibana sample data]
 
-<1> This is the destination index for the {transform}. It is ignored by
-`_preview`.
-<2> Two `group_by` fields have been selected. This means the {transform} will
-contain a unique row per `user` and `customer_id` combination. Within this
-dataset both these fields are unique. By including both in the {transform} it
-gives more context to the final results.
+<1> The destination index for the {transform}. It is ignored by `_preview`.
+<2> Two `group_by` fields is selected. This means the {transform} contains a 
+unique row per `user` and `customer_id` combination. Within this data set, both 
+these fields are unique. By including both in the {transform}, it gives more 
+context to the final results.
 
-NOTE: In the example above, condensed JSON formatting has been used for easier 
+NOTE: In the example above, condensed JSON formatting is used for easier 
 readability of the pivot object.
 
-The preview {transforms} API enables you to see the layout of the
-{transform} in advance, populated with some sample values. For example:
+The preview {transforms} API enables you to see the layout of the {transform} in 
+advance, populated with some sample values. For example:
 
 [source,js]
 ----------------------------------
@@ -84,6 +84,7 @@ The preview {transforms} API enables you to see the layout of the
 ----------------------------------
 // NOTCONSOLE
 
+
 This {transform} makes it easier to answer questions such as:
 
 * Which customers spend the most?
@@ -100,14 +101,15 @@ enables us to analyze data at scale and gives more flexibility to explore and
 navigate data from a customer centric perspective. In some cases, it can even 
 make creating visualizations much simpler.
 
+
 [[example-airline]]
 == Finding air carriers with the most delays
 
-In this example, we use the Flights sample dataset to find out which air carrier 
-had the most delays. First, we filter the source data such that it excludes all 
-the cancelled flights by using a query filter. Then we transform the data to 
+This example uses the Flights sample data set to find out which air carrier 
+had the most delays. First, filter the source data such that it excludes all 
+the cancelled flights by using a query filter. Then transform the data to 
 contain the distinct number of flights, the sum of delayed minutes, and the sum 
-of the flight minutes by air carrier. Finally, we use a 
+of the flight minutes by air carrier. Finally, use a 
 <<search-aggregations-pipeline-bucket-script-aggregation,`bucket_script`>>
 to determine what percentage of the flight time was actually delay.
 
@@ -151,9 +153,8 @@ POST _transform/_preview
 ----------------------------------
 // TEST[skip:setup kibana sample data]
 
-<1> Filter the source data to select only flights that were not cancelled.
-<2> This is the destination index for the {transform}. It is ignored by
-`_preview`.
+<1> Filter the source data to select only flights that are not cancelled.
+<2> The destination index for the {transform}. It is ignored by `_preview`.
 <3> The data is grouped by the `Carrier` field which contains the airline name.
 <4> This `bucket_script` performs calculations on the results that are returned 
 by the aggregation. In this particular example, it calculates what percentage of 
@@ -183,20 +184,20 @@ This {transform} makes it easier to answer questions such as:
 
 * Which air carrier has the most delays as a percentage of flight time?
 
-NOTE: This data is fictional and does not reflect actual delays 
-or flight stats for any of the featured destination or origin airports.
+NOTE: This data is fictional and does not reflect actual delays or flight stats 
+for any of the featured destination or origin airports.
 
 
 [[example-clientips]]
 == Finding suspicious client IPs
 
-In this example, we use the web log sample dataset to identify suspicious client 
-IPs. We transform the data such that the new index contains the sum of bytes and 
-the number of distinct URLs, agents, incoming requests by location, and 
-geographic destinations for each client IP. We also use filter aggregations to 
-count the specific types of HTTP responses that each client IP receives. 
-Ultimately, the example below transforms web log data into an entity centric 
-index where the entity is `clientip`.
+This example uses the web log sample data set to identify suspicious client IPs. 
+It transform the data such that the new index contains the sum of bytes and the 
+number of distinct URLs, agents, incoming requests by location, and geographic 
+destinations for each client IP. It also uses filter aggregations to count the 
+specific types of HTTP responses that each client IP receives. Ultimately, the 
+example below transforms web log data into an entity centric index where the 
+entity is `clientip`.
 
 [source,console]
 ----------------------------------
@@ -254,9 +255,9 @@ PUT _transform/suspicious_client_ips
 ----------------------------------
 // TEST[skip:setup kibana sample data]
 
-<1> This is the destination index for the {transform}.
-<2> Configures the {transform} to run continuously. It uses the `timestamp` field
-to synchronize the source and destination indices. The worst case
+<1> The destination index for the {transform}.
+<2> Configures the {transform} to run continuously. It uses the `timestamp` 
+field to synchronize the source and destination indices. The worst case 
 ingestion delay is 60 seconds.
 <3> The data is grouped by the `clientip` field.
 <4> Filter aggregation that counts the occurrences of successful (`200`) 
@@ -336,3 +337,149 @@ This {transform} makes it easier to answer questions such as:
 * Which client IPs have high error rates?
 
 * Which client IPs are interacting with a high number of destination countries?
+
+
+[[example-last-log]]
+== Finding the last log event for each IP address
+
+This example uses the web log sample data set to find the last log from an IP 
+address. Let's use the `latest` type of {transform} in continuous mode. It 
+copies the most recent document for each unique key from the source index to the destination index
+and updates the destination index as new data comes into the source index. 
+
+Pick the `clientip` field as the unique key; the data is grouped by this field. 
+Select `timestamp` as the date field that sorts the data chronologically. For 
+continuous mode, specify a date field that is used to identify new documents, 
+and an interval between checks for changes in the source index.
+
+ Let's assume that we're interested in retaining documents only for IP addresses that appeared recently in the log. You can define a retention policy and specify a date field that is used to calculate 
+the age of a document. This example uses the same date field that is used to 
+sort the data. Then set the maximum age of a document; documents that are older 
+than the value you set will be removed from the destination index.
+
+This {transform} creates the destination index that contains the latest login 
+date for each client IP. As the {transform} runs in continuous mode, the 
+destination index will be updated as new data that comes into the source index. 
+Finally, every document that is older than 30 days will be removed from the 
+destination index due to the applied retention policy.
+
+[source,console]
+----------------------------------
+PUT _transform/last-log-from-clientip
+{
+  "source": {
+    "index": [
+      "kibana_sample_data_logs"
+    ]
+  },
+  "latest": {
+    "unique_key": [ <1>
+      "clientip"
+    ],
+    "sort": "timestamp" <2>
+  },
+  "frequency": "1m", <3>
+  "dest": {
+    "index": "last-log-from-clientip"
+  },
+  "sync": { <4>
+    "time": {
+      "field": "timestamp",
+      "delay": "60s"
+    }
+  },
+  "retention_policy": { <5>
+    "time": {
+      "field": "timestamp",
+      "max_age": "30d"
+    }
+  },
+  "settings": {
+    "max_page_search_size": 500
+  }
+}
+
+----------------------------------
+// TEST[skip:setup kibana sample data]
+
+<1> Specifies the field for grouping the data.
+<2> Specifies the date field that is used for sorting the data.
+<3> Sets the interval for the {transform} to check for changes in the source 
+index.
+<4> Contains the time field and delay settings used to synchronize the source 
+and destination indices.
+<5> Specifies the retention policy for the transform. Documents that are older 
+than the configured value will be removed from the destination index. 
+
+
+After you create the {transform}, start it:
+
+[source,console]
+----------------------------------
+POST _transform/last-log-from-clientip/_start
+----------------------------------
+// TEST[skip:setup kibana sample data]
+
+
+After the {transform} processes the data, search the destination index:
+
+[source,console]
+----------------------------------
+GET last-log-from-clientip/_search
+----------------------------------
+// TEST[skip:setup kibana sample data]
+
+
+The search result shows you data like this for each client IP:
+
+[source,js]
+----------------------------------
+{
+  "_index" : "last-log-from-clientip",
+  "_id" : "MOeHH_cUL5urmartKj-b5UQAAAAAAAAA",
+  "_score" : 1.0,
+  "_source" : {
+    "referer" : "http://twitter.com/error/don-lind",
+    "request" : "/elasticsearch",
+    "agent" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
+    "extension" : "",
+    "memory" : null,
+    "ip" : "0.72.176.46",
+    "index" : "kibana_sample_data_logs",
+    "message" : "0.72.176.46 - - [2018-09-18T06:31:00.572Z] \"GET /elasticsearch HTTP/1.1\" 200 7065 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\"",
+    "url" : "https://www.elastic.co/downloads/elasticsearch",
+    "tags" : [
+      "success",
+      "info"
+    ],
+    "geo" : {
+      "srcdest" : "IN:PH",
+      "src" : "IN",
+      "coordinates" : {
+        "lon" : -124.1127917,
+        "lat" : 40.80338889
+      },
+      "dest" : "PH"
+    },
+    "utc_time" : "2021-05-04T06:31:00.572Z",
+    "bytes" : 7065,
+    "machine" : {
+      "os" : "ios",
+      "ram" : 12884901888
+    },
+    "response" : 200,
+    "clientip" : "0.72.176.46",
+    "host" : "www.elastic.co",
+    "event" : {
+      "dataset" : "sample_web_logs"
+    },
+    "phpmemory" : null,
+    "timestamp" : "2021-05-04T06:31:00.572Z"
+  }
+}
+----------------------------------
+// NOTCONSOLE
+
+This {transform} makes it easier to answer questions such as:
+
+* What was the most recent log event associated with a specific IP address?