瀏覽代碼

[DOCS] Adds top_metrics aggs examples to Transform docs (#72738)

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
István Zoltán Szabó 4 年之前
父節點
當前提交
bbfe962cae
共有 2 個文件被更改,包括 220 次插入14 次删除
  1. 218 11
      docs/reference/transform/examples.asciidoc
  2. 2 3
      docs/reference/transform/painless-examples.asciidoc

+ 218 - 11
docs/reference/transform/examples.asciidoc

@@ -11,11 +11,12 @@ from your data. All the examples use one of the
 {kibana-ref}/add-sample-data.html[{kib} sample datasets]. For a more detailed, 
 step-by-step example, see <<ecommerce-transforms>>.
 
-* <<example-best-customers>>
-* <<example-airline>>
-* <<example-clientips>>
-* <<example-last-log>>
-
+* <<example-best-customers>> 
+* <<example-airline>> 
+* <<example-clientips>> 
+* <<example-last-log>> 
+* <<example-bytes>> 
+* <<example-customer-names>>
 
 [[example-best-customers]]
 == Finding your best customers
@@ -344,18 +345,21 @@ This {transform} makes it easier to answer questions such as:
 
 This example uses the web log sample data set to find the last log from an IP 
 address. Let's use the `latest` type of {transform} in continuous mode. It 
-copies the most recent document for each unique key from the source index to the destination index
-and updates the destination index as new data comes into the source index. 
+copies the most recent document for each unique key from the source index to the 
+destination index and updates the destination index as new data comes into the 
+source index. 
 
 Pick the `clientip` field as the unique key; the data is grouped by this field. 
 Select `timestamp` as the date field that sorts the data chronologically. For 
 continuous mode, specify a date field that is used to identify new documents, 
 and an interval between checks for changes in the source index.
 
- Let's assume that we're interested in retaining documents only for IP addresses that appeared recently in the log. You can define a retention policy and specify a date field that is used to calculate 
-the age of a document. This example uses the same date field that is used to 
-sort the data. Then set the maximum age of a document; documents that are older 
-than the value you set will be removed from the destination index.
+Let's assume that we're interested in retaining documents only for IP addresses 
+that appeared recently in the log. You can define a retention policy and specify 
+a date field that is used to calculate the age of a document. This example uses 
+the same date field that is used to sort the data. Then set the maximum age of a 
+document; documents that are older than the value you set will be removed from 
+the destination index.
 
 This {transform} creates the destination index that contains the latest login 
 date for each client IP. As the {transform} runs in continuous mode, the 
@@ -483,3 +487,206 @@ The search result shows you data like this for each client IP:
 This {transform} makes it easier to answer questions such as:
 
 * What was the most recent log event associated with a specific IP address?
+
+
+[[example-bytes]]
+== Finding client IPs that sent the most bytes to the server
+
+This example uses the web log sample data set to find the client IP that sent 
+the most bytes to the server in every hour. The example uses a `pivot` 
+{transform} with a <<search-aggregations-metrics-top-metrics,`top_metrics`>> 
+aggregation.
+
+Group the data by a <<_date_histogram,date histogram>> on the time field with an 
+interval of one hour. Use a 
+<<search-aggregations-metrics-max-aggregation,max aggregation>> on the `bytes` 
+field to get the maximum amount of data that is sent to the server. Without 
+the `max` aggregation, the API call still returns the client IP that sent the 
+most bytes, however, the amount of bytes that it sent is not returned. In the 
+`top_metrics` property, specify `clientip` and `geo.src`, then sort them by the 
+`bytes` field in descending order. The {transform} returns the client IP that 
+sent the biggest amount of data and the 2-letter ISO code of the corresponding 
+location.
+
+[source,console]
+----------------------------------
+POST _transform/_preview
+{
+  "source": {
+    "index": "kibana_sample_data_logs"
+  },
+  "pivot": {
+    "group_by": { <1>
+      "timestamp": {
+        "date_histogram": {
+          "field": "timestamp",
+          "fixed_interval": "1h"
+        }
+      }
+    },
+    "aggregations": {
+      "bytes.max": { <2>
+        "max": {
+          "field": "bytes"
+        }
+      },
+      "top": {
+        "top_metrics": { <3>
+          "metrics": [
+            {
+              "field": "clientip"
+            },
+            {
+              "field": "geo.src"
+            }
+          ],
+          "sort": {
+            "bytes": "desc"
+          }
+        }
+      }
+    }
+  }
+}
+----------------------------------
+// TEST[skip:setup kibana sample data]
+
+<1> The data is grouped by a date histogram of the time field with a one hour 
+interval.
+<2> Calculates the maximum value of the `bytes` field. 
+<3> Specifies the fields (`clientip` and `geo.src`) of the top document to 
+return and the sorting method (document with the highest `bytes` value).
+
+The API call above returns a response similar to this:
+
+[source,js]
+----------------------------------
+{
+  "preview" : [
+    {
+      "top" : {
+        "clientip" : "223.87.60.27",
+        "geo.src" : "IN"
+      },
+      "bytes" : {
+        "max" : 6219
+      },
+      "timestamp" : "2021-04-25T00:00:00.000Z"
+    },
+    {
+      "top" : {
+        "clientip" : "99.74.118.237",
+        "geo.src" : "LK"
+      },
+      "bytes" : {
+        "max" : 14113
+      },
+      "timestamp" : "2021-04-25T03:00:00.000Z"
+    },
+    {
+      "top" : {
+        "clientip" : "218.148.135.12",
+        "geo.src" : "BR"
+      },
+      "bytes" : {
+        "max" : 4531
+      },
+      "timestamp" : "2021-04-25T04:00:00.000Z"
+    },
+    ...
+  ]
+}
+----------------------------------
+// NOTCONSOLE
+
+[[example-customer-names]]
+== Getting customer name and email address by customer ID
+
+This example uses the ecommerce sample data set to create an entity-centric 
+index based on customer ID, and to get the customer name and email address by 
+using the `top_metrics` aggregation.
+
+Group the data by `customer_id`, then add a `top_metrics` aggregation where the 
+`metrics` are the `email`, the `customer_first_name.keyword`, and the 
+`customer_last_name.keyword` fields. Sort the `top_metrics` by `order_date` in 
+descending order. The API call looks like this:
+
+[source,console]
+----------------------------------
+POST _transform/_preview 
+{
+  "source": {
+    "index": "kibana_sample_data_ecommerce"
+  },
+  "pivot": {
+    "group_by": { <1>
+      "customer_id": {
+        "terms": {
+          "field": "customer_id"
+        }
+      }
+    },
+    "aggregations": {
+      "last": {
+        "top_metrics": { <2>
+          "metrics": [
+            {
+              "field": "email"
+            },
+            {
+              "field": "customer_first_name.keyword"
+            },
+            {
+              "field": "customer_last_name.keyword"
+            }
+          ],
+          "sort": {
+            "order_date": "desc"
+          }
+        }
+      }
+    }
+  }
+}
+----------------------------------
+// TEST[skip:setup kibana sample data]
+
+<1> The data is grouped by a `terms` aggregation on the `customer_id` field.
+<2> Specifies the fields to return (email and name fields) in a descending order 
+by the order date.
+
+The API returns a response that is similar to this:
+
+[source,js]
+----------------------------------
+ { 
+  "preview" : [
+    {
+      "last" : {
+        "customer_last_name.keyword" : "Long",
+        "customer_first_name.keyword" : "Recip",
+        "email" : "recip@long-family.zzz"
+      },
+      "customer_id" : "10"
+    },
+    {
+      "last" : {
+        "customer_last_name.keyword" : "Jackson",
+        "customer_first_name.keyword" : "Fitzgerald",
+        "email" : "fitzgerald@jackson-family.zzz"
+      },
+      "customer_id" : "11"
+    },
+    {
+      "last" : {
+        "customer_last_name.keyword" : "Cross",
+        "customer_first_name.keyword" : "Brigitte",
+        "email" : "brigitte@cross-family.zzz"
+      },
+      "customer_id" : "12"
+    },
+    ...
+  ]
+}
+----------------------------------
+// NOTCONSOLE

+ 2 - 3
docs/reference/transform/painless-examples.asciidoc

@@ -77,9 +77,8 @@ returned by each shard and returns the document with the latest timestamp
 (`last_doc`). In the response, the top hit (in other words, the `latest_doc`) is 
 nested below the `latest_doc` field.
 
-Check the
-<<scripted-metric-aggregation-scope,scope of scripts>>
-for detailed explanation on the respective scripts.
+Check the <<scripted-metric-aggregation-scope,scope of scripts>> for detailed 
+explanation on the respective scripts.
 
 You can retrieve the last value in a similar way: