|
@@ -6,24 +6,25 @@
|
|
|
<titleabbrev>Examples</titleabbrev>
|
|
|
++++
|
|
|
|
|
|
-These examples demonstrate how to use {transforms} to derive useful
|
|
|
-insights from your data. All the examples use one of the
|
|
|
+These examples demonstrate how to use {transforms} to derive useful insights
|
|
|
+from your data. All the examples use one of the
|
|
|
{kibana-ref}/add-sample-data.html[{kib} sample datasets]. For a more detailed,
|
|
|
-step-by-step example, see
|
|
|
-<<ecommerce-transforms>>.
|
|
|
+step-by-step example, see <<ecommerce-transforms>>.
|
|
|
|
|
|
* <<example-best-customers>>
|
|
|
* <<example-airline>>
|
|
|
* <<example-clientips>>
|
|
|
+* <<example-last-log>>
|
|
|
+
|
|
|
|
|
|
[[example-best-customers]]
|
|
|
== Finding your best customers
|
|
|
|
|
|
-In this example, we use the eCommerce orders sample dataset to find the
|
|
|
-customers who spent the most in our hypothetical webshop. Let's transform the
|
|
|
-data such that the destination index contains the number of orders, the total
|
|
|
-price of the orders, the amount of unique products and the average price per
|
|
|
-order, and the total amount of ordered products for each customer.
|
|
|
+This example uses the eCommerce orders sample data set to find the customers who
|
|
|
+spent the most in a hypothetical webshop. Let's transform the data such that the
|
|
|
+destination index contains the number of orders, the total price of the orders,
|
|
|
+the amount of unique products and the average price per order, and the total
|
|
|
+amount of ordered products for each customer.
|
|
|
|
|
|
[source,console]
|
|
|
----------------------------------
|
|
@@ -52,18 +53,17 @@ POST _transform/_preview
|
|
|
----------------------------------
|
|
|
// TEST[skip:setup kibana sample data]
|
|
|
|
|
|
-<1> This is the destination index for the {transform}. It is ignored by
|
|
|
-`_preview`.
|
|
|
-<2> Two `group_by` fields have been selected. This means the {transform} will
|
|
|
-contain a unique row per `user` and `customer_id` combination. Within this
|
|
|
-dataset both these fields are unique. By including both in the {transform} it
|
|
|
-gives more context to the final results.
|
|
|
+<1> The destination index for the {transform}. It is ignored by `_preview`.
|
|
|
+<2> Two `group_by` fields is selected. This means the {transform} contains a
|
|
|
+unique row per `user` and `customer_id` combination. Within this data set, both
|
|
|
+these fields are unique. By including both in the {transform}, it gives more
|
|
|
+context to the final results.
|
|
|
|
|
|
-NOTE: In the example above, condensed JSON formatting has been used for easier
|
|
|
+NOTE: In the example above, condensed JSON formatting is used for easier
|
|
|
readability of the pivot object.
|
|
|
|
|
|
-The preview {transforms} API enables you to see the layout of the
|
|
|
-{transform} in advance, populated with some sample values. For example:
|
|
|
+The preview {transforms} API enables you to see the layout of the {transform} in
|
|
|
+advance, populated with some sample values. For example:
|
|
|
|
|
|
[source,js]
|
|
|
----------------------------------
|
|
@@ -84,6 +84,7 @@ The preview {transforms} API enables you to see the layout of the
|
|
|
----------------------------------
|
|
|
// NOTCONSOLE
|
|
|
|
|
|
+
|
|
|
This {transform} makes it easier to answer questions such as:
|
|
|
|
|
|
* Which customers spend the most?
|
|
@@ -100,14 +101,15 @@ enables us to analyze data at scale and gives more flexibility to explore and
|
|
|
navigate data from a customer centric perspective. In some cases, it can even
|
|
|
make creating visualizations much simpler.
|
|
|
|
|
|
+
|
|
|
[[example-airline]]
|
|
|
== Finding air carriers with the most delays
|
|
|
|
|
|
-In this example, we use the Flights sample dataset to find out which air carrier
|
|
|
-had the most delays. First, we filter the source data such that it excludes all
|
|
|
-the cancelled flights by using a query filter. Then we transform the data to
|
|
|
+This example uses the Flights sample data set to find out which air carrier
|
|
|
+had the most delays. First, filter the source data such that it excludes all
|
|
|
+the cancelled flights by using a query filter. Then transform the data to
|
|
|
contain the distinct number of flights, the sum of delayed minutes, and the sum
|
|
|
-of the flight minutes by air carrier. Finally, we use a
|
|
|
+of the flight minutes by air carrier. Finally, use a
|
|
|
<<search-aggregations-pipeline-bucket-script-aggregation,`bucket_script`>>
|
|
|
to determine what percentage of the flight time was actually delay.
|
|
|
|
|
@@ -151,9 +153,8 @@ POST _transform/_preview
|
|
|
----------------------------------
|
|
|
// TEST[skip:setup kibana sample data]
|
|
|
|
|
|
-<1> Filter the source data to select only flights that were not cancelled.
|
|
|
-<2> This is the destination index for the {transform}. It is ignored by
|
|
|
-`_preview`.
|
|
|
+<1> Filter the source data to select only flights that are not cancelled.
|
|
|
+<2> The destination index for the {transform}. It is ignored by `_preview`.
|
|
|
<3> The data is grouped by the `Carrier` field which contains the airline name.
|
|
|
<4> This `bucket_script` performs calculations on the results that are returned
|
|
|
by the aggregation. In this particular example, it calculates what percentage of
|
|
@@ -183,20 +184,20 @@ This {transform} makes it easier to answer questions such as:
|
|
|
|
|
|
* Which air carrier has the most delays as a percentage of flight time?
|
|
|
|
|
|
-NOTE: This data is fictional and does not reflect actual delays
|
|
|
-or flight stats for any of the featured destination or origin airports.
|
|
|
+NOTE: This data is fictional and does not reflect actual delays or flight stats
|
|
|
+for any of the featured destination or origin airports.
|
|
|
|
|
|
|
|
|
[[example-clientips]]
|
|
|
== Finding suspicious client IPs
|
|
|
|
|
|
-In this example, we use the web log sample dataset to identify suspicious client
|
|
|
-IPs. We transform the data such that the new index contains the sum of bytes and
|
|
|
-the number of distinct URLs, agents, incoming requests by location, and
|
|
|
-geographic destinations for each client IP. We also use filter aggregations to
|
|
|
-count the specific types of HTTP responses that each client IP receives.
|
|
|
-Ultimately, the example below transforms web log data into an entity centric
|
|
|
-index where the entity is `clientip`.
|
|
|
+This example uses the web log sample data set to identify suspicious client IPs.
|
|
|
+It transform the data such that the new index contains the sum of bytes and the
|
|
|
+number of distinct URLs, agents, incoming requests by location, and geographic
|
|
|
+destinations for each client IP. It also uses filter aggregations to count the
|
|
|
+specific types of HTTP responses that each client IP receives. Ultimately, the
|
|
|
+example below transforms web log data into an entity centric index where the
|
|
|
+entity is `clientip`.
|
|
|
|
|
|
[source,console]
|
|
|
----------------------------------
|
|
@@ -254,9 +255,9 @@ PUT _transform/suspicious_client_ips
|
|
|
----------------------------------
|
|
|
// TEST[skip:setup kibana sample data]
|
|
|
|
|
|
-<1> This is the destination index for the {transform}.
|
|
|
-<2> Configures the {transform} to run continuously. It uses the `timestamp` field
|
|
|
-to synchronize the source and destination indices. The worst case
|
|
|
+<1> The destination index for the {transform}.
|
|
|
+<2> Configures the {transform} to run continuously. It uses the `timestamp`
|
|
|
+field to synchronize the source and destination indices. The worst case
|
|
|
ingestion delay is 60 seconds.
|
|
|
<3> The data is grouped by the `clientip` field.
|
|
|
<4> Filter aggregation that counts the occurrences of successful (`200`)
|
|
@@ -336,3 +337,149 @@ This {transform} makes it easier to answer questions such as:
|
|
|
* Which client IPs have high error rates?
|
|
|
|
|
|
* Which client IPs are interacting with a high number of destination countries?
|
|
|
+
|
|
|
+
|
|
|
+[[example-last-log]]
|
|
|
+== Finding the last log event for each IP address
|
|
|
+
|
|
|
+This example uses the web log sample data set to find the last log from an IP
|
|
|
+address. Let's use the `latest` type of {transform} in continuous mode. It
|
|
|
+copies the most recent document for each unique key from the source index to the destination index
|
|
|
+and updates the destination index as new data comes into the source index.
|
|
|
+
|
|
|
+Pick the `clientip` field as the unique key; the data is grouped by this field.
|
|
|
+Select `timestamp` as the date field that sorts the data chronologically. For
|
|
|
+continuous mode, specify a date field that is used to identify new documents,
|
|
|
+and an interval between checks for changes in the source index.
|
|
|
+
|
|
|
+ Let's assume that we're interested in retaining documents only for IP addresses that appeared recently in the log. You can define a retention policy and specify a date field that is used to calculate
|
|
|
+the age of a document. This example uses the same date field that is used to
|
|
|
+sort the data. Then set the maximum age of a document; documents that are older
|
|
|
+than the value you set will be removed from the destination index.
|
|
|
+
|
|
|
+This {transform} creates the destination index that contains the latest login
|
|
|
+date for each client IP. As the {transform} runs in continuous mode, the
|
|
|
+destination index will be updated as new data that comes into the source index.
|
|
|
+Finally, every document that is older than 30 days will be removed from the
|
|
|
+destination index due to the applied retention policy.
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----------------------------------
|
|
|
+PUT _transform/last-log-from-clientip
|
|
|
+{
|
|
|
+ "source": {
|
|
|
+ "index": [
|
|
|
+ "kibana_sample_data_logs"
|
|
|
+ ]
|
|
|
+ },
|
|
|
+ "latest": {
|
|
|
+ "unique_key": [ <1>
|
|
|
+ "clientip"
|
|
|
+ ],
|
|
|
+ "sort": "timestamp" <2>
|
|
|
+ },
|
|
|
+ "frequency": "1m", <3>
|
|
|
+ "dest": {
|
|
|
+ "index": "last-log-from-clientip"
|
|
|
+ },
|
|
|
+ "sync": { <4>
|
|
|
+ "time": {
|
|
|
+ "field": "timestamp",
|
|
|
+ "delay": "60s"
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "retention_policy": { <5>
|
|
|
+ "time": {
|
|
|
+ "field": "timestamp",
|
|
|
+ "max_age": "30d"
|
|
|
+ }
|
|
|
+ },
|
|
|
+ "settings": {
|
|
|
+ "max_page_search_size": 500
|
|
|
+ }
|
|
|
+}
|
|
|
+
|
|
|
+----------------------------------
|
|
|
+// TEST[skip:setup kibana sample data]
|
|
|
+
|
|
|
+<1> Specifies the field for grouping the data.
|
|
|
+<2> Specifies the date field that is used for sorting the data.
|
|
|
+<3> Sets the interval for the {transform} to check for changes in the source
|
|
|
+index.
|
|
|
+<4> Contains the time field and delay settings used to synchronize the source
|
|
|
+and destination indices.
|
|
|
+<5> Specifies the retention policy for the transform. Documents that are older
|
|
|
+than the configured value will be removed from the destination index.
|
|
|
+
|
|
|
+
|
|
|
+After you create the {transform}, start it:
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----------------------------------
|
|
|
+POST _transform/last-log-from-clientip/_start
|
|
|
+----------------------------------
|
|
|
+// TEST[skip:setup kibana sample data]
|
|
|
+
|
|
|
+
|
|
|
+After the {transform} processes the data, search the destination index:
|
|
|
+
|
|
|
+[source,console]
|
|
|
+----------------------------------
|
|
|
+GET last-log-from-clientip/_search
|
|
|
+----------------------------------
|
|
|
+// TEST[skip:setup kibana sample data]
|
|
|
+
|
|
|
+
|
|
|
+The search result shows you data like this for each client IP:
|
|
|
+
|
|
|
+[source,js]
|
|
|
+----------------------------------
|
|
|
+{
|
|
|
+ "_index" : "last-log-from-clientip",
|
|
|
+ "_id" : "MOeHH_cUL5urmartKj-b5UQAAAAAAAAA",
|
|
|
+ "_score" : 1.0,
|
|
|
+ "_source" : {
|
|
|
+ "referer" : "http://twitter.com/error/don-lind",
|
|
|
+ "request" : "/elasticsearch",
|
|
|
+ "agent" : "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)",
|
|
|
+ "extension" : "",
|
|
|
+ "memory" : null,
|
|
|
+ "ip" : "0.72.176.46",
|
|
|
+ "index" : "kibana_sample_data_logs",
|
|
|
+ "message" : "0.72.176.46 - - [2018-09-18T06:31:00.572Z] \"GET /elasticsearch HTTP/1.1\" 200 7065 \"-\" \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)\"",
|
|
|
+ "url" : "https://www.elastic.co/downloads/elasticsearch",
|
|
|
+ "tags" : [
|
|
|
+ "success",
|
|
|
+ "info"
|
|
|
+ ],
|
|
|
+ "geo" : {
|
|
|
+ "srcdest" : "IN:PH",
|
|
|
+ "src" : "IN",
|
|
|
+ "coordinates" : {
|
|
|
+ "lon" : -124.1127917,
|
|
|
+ "lat" : 40.80338889
|
|
|
+ },
|
|
|
+ "dest" : "PH"
|
|
|
+ },
|
|
|
+ "utc_time" : "2021-05-04T06:31:00.572Z",
|
|
|
+ "bytes" : 7065,
|
|
|
+ "machine" : {
|
|
|
+ "os" : "ios",
|
|
|
+ "ram" : 12884901888
|
|
|
+ },
|
|
|
+ "response" : 200,
|
|
|
+ "clientip" : "0.72.176.46",
|
|
|
+ "host" : "www.elastic.co",
|
|
|
+ "event" : {
|
|
|
+ "dataset" : "sample_web_logs"
|
|
|
+ },
|
|
|
+ "phpmemory" : null,
|
|
|
+ "timestamp" : "2021-05-04T06:31:00.572Z"
|
|
|
+ }
|
|
|
+}
|
|
|
+----------------------------------
|
|
|
+// NOTCONSOLE
|
|
|
+
|
|
|
+This {transform} makes it easier to answer questions such as:
|
|
|
+
|
|
|
+* What was the most recent log event associated with a specific IP address?
|