Browse Source

[DOCS] Makes the description clearer on how to use aggregations in an anomaly detection job (#53103)

Co-authored-by: lcawl <lcawley@elastic.co>
István Zoltán Szabó 5 years ago
parent
commit
54b66d3385
1 changed files with 15 additions and 19 deletions
  1. 15 19
      docs/reference/ml/anomaly-detection/aggregations.asciidoc

+ 15 - 19
docs/reference/ml/anomaly-detection/aggregations.asciidoc

@@ -45,20 +45,21 @@ PUT _ml/anomaly_detectors/farequote
     "bucket_span": "60m",
     "detectors": [{
       "function": "mean",
-      "field_name": "responsetime",
-      "by_field_name": "airline"
+      "field_name": "responsetime",  <1>
+      "by_field_name": "airline"  <1>
     }],
     "summary_count_field_name": "doc_count"
   },
   "data_description": {
-    "time_field":"time"
+    "time_field":"time"  <1>
   }
 }
 ----------------------------------
 // TEST[skip:setup:farequote_data]
 
-In this example, the `airline`, `responsetime`, and `time` fields are
-aggregations.
+<1> In this example, the `airline`, `responsetime`, and `time` fields are
+aggregations. Only the aggregated fields defined in the `analysis_config` object 
+are analyzed by the {anomaly-job}.
 
 NOTE: When the `summary_count_field_name` property is set to a non-null value,
 the job expects to receive aggregated input. The property must be set to the
@@ -81,16 +82,16 @@ PUT _ml/datafeeds/datafeed-farequote
         "time_zone": "UTC"
       },
       "aggregations": {
-        "time": {
+        "time": {  <1>
           "max": {"field": "time"}
         },
-        "airline": {
+        "airline": {  <1>
           "terms": {
             "field": "airline",
             "size": 100
           },
           "aggregations": {
-            "responsetime": {
+            "responsetime": {  <1>
               "avg": {
                 "field": "responsetime"
               }
@@ -104,18 +105,13 @@ PUT _ml/datafeeds/datafeed-farequote
 ----------------------------------
 // TEST[skip:setup:farequote_job]
 
-In this example, the aggregations have names that match the fields that they
+<1> In this example, the aggregations have names that match the fields that they
 operate on. That is to say, the `max` aggregation is named `time` and its
 field is also `time`. The same is true for the aggregations with the names
-`airline` and `responsetime`. Since you must create the job before you can
-create the {dfeed}, synchronizing your aggregation and field names can simplify
-these configuration steps.
+`airline` and `responsetime`. 
 
-IMPORTANT: If you use a `max` aggregation on a time field, the aggregation name
-in the {dfeed} must match the name of the time field, as in the previous example.
-For all other aggregations, if the aggregation name doesn't match the field name,
-there are limitations in the drill-down functionality within the {ml} page in
-{kib}.
+IMPORTANT: Your {dfeed} can contain multiple aggregations, but only the ones
+with names that match values in the job configuration are fed to the job.
 
 {dfeeds-cap} support complex nested aggregations, this example uses the `derivative`
 pipeline aggregation to find the first order derivative of the counter
@@ -243,8 +239,8 @@ When you define an aggregation in a {dfeed}, it must have the following form:
 The top level aggregation must be either a
 {ref}/search-aggregations-bucket.html[bucket aggregation] containing as single
 sub-aggregation that is a `date_histogram` or the top level aggregation is the
-required `date_histogram`. There must be exactly 1 `date_histogram` aggregation.
-For more information, see
+required `date_histogram`. There must be exactly one `date_histogram`
+aggregation. For more information, see
 {ref}/search-aggregations-bucket-datehistogram-aggregation.html[Date histogram aggregation].
 
 NOTE: The `time_zone` parameter in the date histogram aggregation must be set to