瀏覽代碼

[DOCS] Adds top_metrics aggs examples to Transform docs (#72738)

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
István Zoltán Szabó 4 年之前
父節點
當前提交
bbfe962cae
共有 2 個文件被更改,包括 220 次插入14 次删除
  1. 218 11
      docs/reference/transform/examples.asciidoc
  2. 2 3
      docs/reference/transform/painless-examples.asciidoc

+ 218 - 11
docs/reference/transform/examples.asciidoc

@@ -11,11 +11,12 @@ from your data. All the examples use one of the
 {kibana-ref}/add-sample-data.html[{kib} sample datasets]. For a more detailed, 
 {kibana-ref}/add-sample-data.html[{kib} sample datasets]. For a more detailed, 
 step-by-step example, see <<ecommerce-transforms>>.
 step-by-step example, see <<ecommerce-transforms>>.
 
 
-* <<example-best-customers>>
-* <<example-airline>>
-* <<example-clientips>>
-* <<example-last-log>>
-
+* <<example-best-customers>> 
+* <<example-airline>> 
+* <<example-clientips>> 
+* <<example-last-log>> 
+* <<example-bytes>> 
+* <<example-customer-names>>
 
 
 [[example-best-customers]]
 [[example-best-customers]]
 == Finding your best customers
 == Finding your best customers
@@ -344,18 +345,21 @@ This {transform} makes it easier to answer questions such as:
 
 
 This example uses the web log sample data set to find the last log from an IP 
 This example uses the web log sample data set to find the last log from an IP 
 address. Let's use the `latest` type of {transform} in continuous mode. It 
 address. Let's use the `latest` type of {transform} in continuous mode. It 
-copies the most recent document for each unique key from the source index to the destination index
-and updates the destination index as new data comes into the source index. 
+copies the most recent document for each unique key from the source index to the 
+destination index and updates the destination index as new data comes into the 
+source index. 
 
 
 Pick the `clientip` field as the unique key; the data is grouped by this field. 
 Pick the `clientip` field as the unique key; the data is grouped by this field. 
 Select `timestamp` as the date field that sorts the data chronologically. For 
 Select `timestamp` as the date field that sorts the data chronologically. For 
 continuous mode, specify a date field that is used to identify new documents, 
 continuous mode, specify a date field that is used to identify new documents, 
 and an interval between checks for changes in the source index.
 and an interval between checks for changes in the source index.
 
 
- Let's assume that we're interested in retaining documents only for IP addresses that appeared recently in the log. You can define a retention policy and specify a date field that is used to calculate 
-the age of a document. This example uses the same date field that is used to 
-sort the data. Then set the maximum age of a document; documents that are older 
-than the value you set will be removed from the destination index.
+Let's assume that we're interested in retaining documents only for IP addresses 
+that appeared recently in the log. You can define a retention policy and specify 
+a date field that is used to calculate the age of a document. This example uses 
+the same date field that is used to sort the data. Then set the maximum age of a 
+document; documents that are older than the value you set will be removed from 
+the destination index.
 
 
 This {transform} creates the destination index that contains the latest login 
 This {transform} creates the destination index that contains the latest login 
 date for each client IP. As the {transform} runs in continuous mode, the 
 date for each client IP. As the {transform} runs in continuous mode, the 
@@ -483,3 +487,206 @@ The search result shows you data like this for each client IP:
 This {transform} makes it easier to answer questions such as:
 This {transform} makes it easier to answer questions such as:
 
 
 * What was the most recent log event associated with a specific IP address?
 * What was the most recent log event associated with a specific IP address?
+
+
+[[example-bytes]]
+== Finding client IPs that sent the most bytes to the server
+
+This example uses the web log sample data set to find the client IP that sent 
+the most bytes to the server in every hour. The example uses a `pivot` 
+{transform} with a <<search-aggregations-metrics-top-metrics,`top_metrics`>> 
+aggregation.
+
+Group the data by a <<_date_histogram,date histogram>> on the time field with an 
+interval of one hour. Use a 
+<<search-aggregations-metrics-max-aggregation,max aggregation>> on the `bytes` 
+field to get the maximum amount of data that is sent to the server. Without 
+the `max` aggregation, the API call still returns the client IP that sent the 
+most bytes, however, the amount of bytes that it sent is not returned. In the 
+`top_metrics` property, specify `clientip` and `geo.src`, then sort them by the 
+`bytes` field in descending order. The {transform} returns the client IP that 
+sent the biggest amount of data and the 2-letter ISO code of the corresponding 
+location.
+
+[source,console]
+----------------------------------
+POST _transform/_preview
+{
+  "source": {
+    "index": "kibana_sample_data_logs"
+  },
+  "pivot": {
+    "group_by": { <1>
+      "timestamp": {
+        "date_histogram": {
+          "field": "timestamp",
+          "fixed_interval": "1h"
+        }
+      }
+    },
+    "aggregations": {
+      "bytes.max": { <2>
+        "max": {
+          "field": "bytes"
+        }
+      },
+      "top": {
+        "top_metrics": { <3>
+          "metrics": [
+            {
+              "field": "clientip"
+            },
+            {
+              "field": "geo.src"
+            }
+          ],
+          "sort": {
+            "bytes": "desc"
+          }
+        }
+      }
+    }
+  }
+}
+----------------------------------
+// TEST[skip:setup kibana sample data]
+
+<1> The data is grouped by a date histogram of the time field with a one hour 
+interval.
+<2> Calculates the maximum value of the `bytes` field. 
+<3> Specifies the fields (`clientip` and `geo.src`) of the top document to 
+return and the sorting method (document with the highest `bytes` value).
+
+The API call above returns a response similar to this:
+
+[source,js]
+----------------------------------
+{
+  "preview" : [
+    {
+      "top" : {
+        "clientip" : "223.87.60.27",
+        "geo.src" : "IN"
+      },
+      "bytes" : {
+        "max" : 6219
+      },
+      "timestamp" : "2021-04-25T00:00:00.000Z"
+    },
+    {
+      "top" : {
+        "clientip" : "99.74.118.237",
+        "geo.src" : "LK"
+      },
+      "bytes" : {
+        "max" : 14113
+      },
+      "timestamp" : "2021-04-25T03:00:00.000Z"
+    },
+    {
+      "top" : {
+        "clientip" : "218.148.135.12",
+        "geo.src" : "BR"
+      },
+      "bytes" : {
+        "max" : 4531
+      },
+      "timestamp" : "2021-04-25T04:00:00.000Z"
+    },
+    ...
+  ]
+}
+----------------------------------
+// NOTCONSOLE
+
+[[example-customer-names]]
+== Getting customer name and email address by customer ID
+
+This example uses the ecommerce sample data set to create an entity-centric 
+index based on customer ID, and to get the customer name and email address by 
+using the `top_metrics` aggregation.
+
+Group the data by `customer_id`, then add a `top_metrics` aggregation where the 
+`metrics` are the `email`, the `customer_first_name.keyword`, and the 
+`customer_last_name.keyword` fields. Sort the `top_metrics` by `order_date` in 
+descending order. The API call looks like this:
+
+[source,console]
+----------------------------------
+POST _transform/_preview 
+{
+  "source": {
+    "index": "kibana_sample_data_ecommerce"
+  },
+  "pivot": {
+    "group_by": { <1>
+      "customer_id": {
+        "terms": {
+          "field": "customer_id"
+        }
+      }
+    },
+    "aggregations": {
+      "last": {
+        "top_metrics": { <2>
+          "metrics": [
+            {
+              "field": "email"
+            },
+            {
+              "field": "customer_first_name.keyword"
+            },
+            {
+              "field": "customer_last_name.keyword"
+            }
+          ],
+          "sort": {
+            "order_date": "desc"
+          }
+        }
+      }
+    }
+  }
+}
+----------------------------------
+// TEST[skip:setup kibana sample data]
+
+<1> The data is grouped by a `terms` aggregation on the `customer_id` field.
+<2> Specifies the fields to return (email and name fields) in a descending order 
+by the order date.
+
+The API returns a response that is similar to this:
+
+[source,js]
+----------------------------------
+ { 
+  "preview" : [
+    {
+      "last" : {
+        "customer_last_name.keyword" : "Long",
+        "customer_first_name.keyword" : "Recip",
+        "email" : "recip@long-family.zzz"
+      },
+      "customer_id" : "10"
+    },
+    {
+      "last" : {
+        "customer_last_name.keyword" : "Jackson",
+        "customer_first_name.keyword" : "Fitzgerald",
+        "email" : "fitzgerald@jackson-family.zzz"
+      },
+      "customer_id" : "11"
+    },
+    {
+      "last" : {
+        "customer_last_name.keyword" : "Cross",
+        "customer_first_name.keyword" : "Brigitte",
+        "email" : "brigitte@cross-family.zzz"
+      },
+      "customer_id" : "12"
+    },
+    ...
+  ]
+}
+----------------------------------
+// NOTCONSOLE

+ 2 - 3
docs/reference/transform/painless-examples.asciidoc

@@ -77,9 +77,8 @@ returned by each shard and returns the document with the latest timestamp
 (`last_doc`). In the response, the top hit (in other words, the `latest_doc`) is 
 (`last_doc`). In the response, the top hit (in other words, the `latest_doc`) is 
 nested below the `latest_doc` field.
 nested below the `latest_doc` field.
 
 
-Check the
-<<scripted-metric-aggregation-scope,scope of scripts>>
-for detailed explanation on the respective scripts.
+Check the <<scripted-metric-aggregation-scope,scope of scripts>> for detailed 
+explanation on the respective scripts.
 
 
 You can retrieve the last value in a similar way: 
 You can retrieve the last value in a similar way: