Browse Source

[DOCS] Update datafeed details in ML docs (#76854)

Lisa Cawley 4 years ago
parent
commit
d36f24fbc3

+ 9 - 7
docs/reference/ml/anomaly-detection/apis/close-job.asciidoc

@@ -7,10 +7,6 @@
 ++++
 
 Closes one or more {anomaly-jobs}.
-A job can be opened and closed multiple times throughout its lifecycle.
-
-A closed job cannot receive data or perform analysis
-operations, but you can still explore and navigate results.
 
 [[ml-close-job-request]]
 == {api-request-title}
@@ -30,13 +26,19 @@ operations, but you can still explore and navigate results.
 [[ml-close-job-desc]]
 == {api-description-title}
 
+A job can be opened and closed multiple times throughout its lifecycle.
+
+A closed job cannot receive data or perform analysis operations, but you can
+still explore and navigate results.
+
 You can close multiple {anomaly-jobs} in a single API request by using a group
 name, a comma-separated list of jobs, or a wildcard expression. You can close
 all jobs by using `_all` or by specifying `*` as the `<job_id>`.
 
-If you close an {anomaly-job} whose {dfeed} is running, the request will first
-attempt to stop the {dfeed}, as though <<ml-stop-datafeed>> was called with
-the same `timeout` and `force` parameters as the close request.
+If you close an {anomaly-job} whose {dfeed} is running, the request first tries
+to stop the {dfeed}. This behavior is equivalent to calling
+<<ml-stop-datafeed,stop {dfeed}>> with the same `timeout` and `force` parameters
+as the close job request.
 
 When you close a job, it runs housekeeping tasks such as pruning the model history,
 flushing buffers, calculating final results and persisting the model snapshots.

+ 4 - 3
docs/reference/ml/anomaly-detection/apis/delete-job.asciidoc

@@ -34,9 +34,10 @@ are granted to anyone over the `.ml-*` indices.
 It is not currently possible to delete multiple jobs using wildcards or a comma
 separated list.
 
-If you delete a job that has a {dfeed}, the request will first attempt to
-delete the {dfeed}, as though <<ml-delete-datafeed>> was called with the same
-`timeout` and `force` parameters as this delete request.
+If you delete a job that has a {dfeed}, the request first tries to delete the
+{dfeed}. This behavior is equivalent to calling
+<<ml-delete-datafeed,delete {dfeed}>> with the same `timeout` and `force`
+parameters as the delete job request.
 
 [[ml-delete-job-path-parms]]
 == {api-path-parms-title}

+ 4 - 4
docs/reference/ml/anomaly-detection/apis/put-datafeed.asciidoc

@@ -36,10 +36,10 @@ each interval. See {ml-docs}/ml-delayed-data-detection.html[Handling delayed dat
 
 [IMPORTANT]
 ====
-* You must use {kib} or this API to create a {dfeed}. Do not add a
-{dfeed} directly to the `.ml-config` index using the {es} index API. If {es}
-{security-features} are enabled, do not give users `write` privileges on the
-`.ml-config` index.
+* You must use {kib}, this API, or the <<ml-put-job,create {anomaly-jobs} API>>
+to create a {dfeed}. Do not add a {dfeed} directly to the `.ml-config` index
+using the {es} index API. If {es} {security-features} are enabled, do not give
+users `write` privileges on the `.ml-config` index.
 * When {es} {security-features} are enabled, your {dfeed} remembers which roles
 the user who created it had at the time of creation and runs the query using
 those same roles. If you provide

+ 15 - 2
docs/reference/ml/anomaly-detection/apis/put-job.asciidoc

@@ -19,13 +19,24 @@ Instantiates an {anomaly-job}.
 Requires the `manage_ml` cluster privilege. This privilege is included in the
 `machine_learning_admin` built-in role.
 
+If you include a `datafeed_config`, you must also have `read` index privileges
+on the source index.
+
 [[ml-put-job-desc]]
 == {api-description-title}
 
-IMPORTANT: You must use {kib} or this API to create an {anomaly-job}. Do not put
+[IMPORTANT]
+====
+* You must use {kib} or this API to create an {anomaly-job}. Do not put
 a job directly to the `.ml-config` index using the {es} index API. If {es}
 {security-features} are enabled, do not give users `write` privileges on the
 `.ml-config` index.
+* If you include a `datafeed_config` and {es} {security-features} are enabled,
+your {dfeed} remembers which roles the user who created it had at the time of
+creation and runs the query using those same roles. If you provide
+<<http-clients-secondary-authorization,secondary authorization headers>>, those
+credentials are used instead.
+====
 
 [[ml-put-job-path-parms]]
 == {api-path-parms-title}
@@ -250,7 +261,9 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=data-description]
 //End data_description
 
 `datafeed_config`::
-(object) The {dfeed} configured for the current {anomaly-job}.
+(Optional, object) The {ml-docs}/ml-dfeeds.html[{dfeed}], which retrieves data
+from {es} for analysis by the job. You can associate only one {dfeed} with each
+{anomaly-job}.
 +
 .Properties of `datafeed`
 [%collapsible%open]

+ 3 - 3
docs/reference/ml/anomaly-detection/functions/ml-geo-functions.asciidoc

@@ -71,8 +71,8 @@ For example, JSON data might contain the following transaction coordinates:
 
 In {es}, location data is likely to be stored in `geo_point` fields. For more
 information, see {ref}/geo-point.html[`geo_point` data type]. This data type is
-supported natively in {ml-features}. Specifically, {dfeed} when pulling data from
-a `geo_point` field, will transform the data into the appropriate `lat,lon` string
-format before sending to the {anomaly-job}.
+supported natively in {ml-features}. Specifically, when pulling data from a
+`geo_point` field, a {dfeed} will transform the data into the appropriate
+`lat,lon` string format before sending to the {anomaly-job}.
 
 For more information, see <<ml-configuring-transform>>.

+ 76 - 98
docs/reference/ml/anomaly-detection/ml-configuring-aggregations.asciidoc

@@ -78,48 +78,30 @@ PUT _ml/anomaly_detectors/farequote
   },
   "data_description": {
     "time_field":"time"  <1>
-  }
-}
-----------------------------------
-// TEST[skip:setup:farequote_data]
-
-<1> The `airline`, `responsetime`, and `time` fields are aggregations. Only the
-aggregated fields defined in the `analysis_config` object are analyzed by the
-{anomaly-job}.
-
-NOTE: When the `summary_count_field_name` property is set to a non-null value,
-the job expects to receive aggregated input. The property must be set to the
-name of the field that contains the count of raw data points that have been
-aggregated. It applies to all detectors in the job.
-
-The aggregations are defined in the {dfeed} as follows:
-
-[source,console]
-----------------------------------
-PUT _ml/datafeeds/datafeed-farequote
-{
-  "job_id":"farequote",
-  "indices": ["farequote"],
-  "aggregations": {
-    "buckets": {
-      "date_histogram": {
-        "field": "time",
-        "fixed_interval": "360s",
-        "time_zone": "UTC"
-      },
-      "aggregations": {
-        "time": {  <1>
-          "max": {"field": "time"}
+  }, 
+  "datafeed_config":{
+    "indices": ["farequote"],
+    "aggregations": {
+      "buckets": {
+        "date_histogram": {
+          "field": "time",
+          "fixed_interval": "360s",
+          "time_zone": "UTC"
         },
-        "airline": {  <2>
-          "terms": {
-            "field": "airline",
-            "size": 100
+        "aggregations": {
+          "time": {  <2>
+            "max": {"field": "time"}
           },
-          "aggregations": {
-            "responsetime": {  <3>
-              "avg": {
-                "field": "responsetime"
+          "airline": {  <3>
+            "terms": {
+             "field": "airline",
+              "size": 100
+            },
+            "aggregations": {
+              "responsetime": {  <4>
+                "avg": {
+                  "field": "responsetime"
+                }
               }
             }
           }
@@ -129,19 +111,27 @@ PUT _ml/datafeeds/datafeed-farequote
   }
 }
 ----------------------------------
-// TEST[skip:setup:farequote_job]
+// TEST[skip:setup:farequote_data]
 
-<1> The aggregations have names that match the fields that they operate on. The
+<1> The `airline`, `responsetime`, and `time` fields are aggregations. Only the
+aggregated fields defined in the `analysis_config` object are analyzed by the
+{anomaly-job}.
+<2> The aggregations have names that match the fields that they operate on. The
 `max` aggregation is named `time` and its field also needs to be `time`.
-<2> The `term` aggregation is named `airline` and its field is also named
+<3> The `term` aggregation is named `airline` and its field is also named
 `airline`.
-<3> The `avg` aggregation is named `responsetime` and its field is also named
+<4> The `avg` aggregation is named `responsetime` and its field is also named
 `responsetime`.
 
+When the `summary_count_field_name` property is set to a non-null value, the job
+expects to receive aggregated input. The property must be set to the name of the
+field that contains the count of raw data points that have been aggregated. It
+applies to all detectors in the job.
+
 TIP: If you are using a `term` aggregation to gather influencer or partition
 field information, consider using a `composite` aggregation. It performs
-better than a `date_histogram` with a nested `term` aggregation and also includes
-all the values of the field instead of the top values per bucket.
+better than a `date_histogram` with a nested `term` aggregation and also
+includes all the values of the field instead of the top values per bucket.
 
 [discrete]
 [[aggs-using-composite]]
@@ -153,15 +143,17 @@ For `composite` aggregation support, there must be exactly one `date_histogram`
 source. That value source must not be sorted in descending order. Additional
 `composite` aggregation value sources are allowed, such as `terms`.
 
-NOTE: A {dfeed} that uses composite aggregations may not be as performant as datafeeds that use scrolling or
-date histogram aggregations. Composite aggregations are optimized
-for queries that are either `match_all` or `range` filters. Other types of
+NOTE: A {dfeed} that uses composite aggregations may not be as performant as
+{dfeeds} that use scrolling or date histogram aggregations. Composite
+aggregations are optimized for queries that are either `match_all` or `range`
+filters. Other types of
 queries may cause the `composite` aggregation to be ineffecient.
 
 Here is an example that uses a `composite` aggregation instead of a
 `date_histogram`.
 
-Assuming the same job configuration as above.
+This is an example of a job with a {dfeed} that uses a `composite` aggregation
+to bucket the metrics based on time and terms:
 
 [source,console]
 ----------------------------------
@@ -178,54 +170,42 @@ PUT _ml/anomaly_detectors/farequote-composite
   },
   "data_description": {
     "time_field":"time"
-  }
-}
-----------------------------------
-// TEST[skip:setup:farequote_data]
-
-This is an example of a datafeed that uses a `composite` aggregation to bucket
-the metrics based on time and terms:
-
-[source,console]
-----------------------------------
-PUT _ml/datafeeds/datafeed-farequote-composite
-{
-  "job_id": "farequote-composite",
-  "indices": [
-    "farequote"
-  ],
-  "aggregations": {
-    "buckets": {
-      "composite": {
-        "size": 1000,  <1>
-        "sources": [
-          {
-            "time_bucket": {  <2>
-              "date_histogram": {
-                "field": "time",
-                "fixed_interval": "360s",
-                "time_zone": "UTC"
+  },
+  "datafeed_config":{
+    "indices": ["farequote"],
+    "aggregations": {
+      "buckets": {
+        "composite": {
+          "size": 1000,  <1>
+          "sources": [
+            {
+              "time_bucket": {  <2>
+                "date_histogram": {
+                  "field": "time",
+                  "fixed_interval": "360s",
+                  "time_zone": "UTC"
+                }
               }
-            }
-          },
-          {
-            "airline": {  <3>
-              "terms": {
-                "field": "airline"
+            },
+            {
+              "airline": {  <3>
+                "terms": {
+                  "field": "airline"
+                }
               }
             }
-          }
-        ]
-      },
-      "aggregations": {
-        "time": {  <4>
-          "max": {
-            "field": "time"
-          }
+          ]
         },
-        "responsetime": { <5>
-          "avg": {
-            "field": "responsetime"
+        "aggregations": {
+          "time": {  <4>
+            "max": {
+              "field": "time"
+            }
+          },
+          "responsetime": { <5>
+            "avg": {
+              "field": "responsetime"
+            }
           }
         }
       }
@@ -233,10 +213,8 @@ PUT _ml/datafeeds/datafeed-farequote-composite
   }
 }
 ----------------------------------
-// TEST[skip:setup:farequote_job]
-
 <1> Provide the `size` to the composite agg to control how many resources
-are used when aggregating the data. A larger `size` means a faster datafeed but
+are used when aggregating the data. A larger `size` means a faster {dfeed} but
 more cluster resources are used when searching.
 <2> The required `date_histogram` composite aggregation source. Make sure it
 is named differently than your desired time field.
@@ -364,7 +342,7 @@ When using a `date_histogram` aggregation to bucket by time:
     "bucket_agg": {
       ...
     },
-    "aggregations": {]
+    "aggregations": {
       "data_histogram_aggregation": {
         "date_histogram": {
           "field": "time",

+ 41 - 51
docs/reference/ml/anomaly-detection/ml-configuring-transform.asciidoc

@@ -107,20 +107,16 @@ PUT _ml/anomaly_detectors/test1
   },
   "data_description": {
     "time_field":"@timestamp"
-  }
-}
-
-PUT _ml/datafeeds/datafeed-test1
-{
-  "job_id": "test1",
-  "indices": [
-    "my-index-000001"
-  ],
-  "runtime_mappings": {
-    "total_error_count": { <2>
-      "type": "long",
-      "script": {
-        "source": "emit(doc['error_count'].value + doc['aborted_count'].value)"
+  },
+  "datafeed_config":{
+    "datafeed_id": "datafeed-test1",
+    "indices": ["my-index-000001"],
+    "runtime_mappings": {
+      "total_error_count": { <2>
+        "type": "long",
+        "script": {
+          "source": "emit(doc['error_count'].value + doc['aborted_count'].value)"
+        }
       }
     }
   }
@@ -193,18 +189,16 @@ PUT _ml/anomaly_detectors/test2
   },
   "data_description": {
     "time_field":"@timestamp"
-  }
-}
-
-PUT _ml/datafeeds/datafeed-test2
-{
-  "job_id": "test2",
-  "indices": ["my-index-000001"],
-  "runtime_mappings": {
-    "my_runtime_field": {
-      "type": "keyword",
-      "script": {
-        "source": "emit(doc['some_field'].value + '_' + doc['another_field'].value)" <2>
+  },
+  "datafeed_config":{
+    "datafeed_id": "datafeed-test2",
+    "indices": ["my-index-000001"],
+    "runtime_mappings": {
+      "my_runtime_field": {
+        "type": "keyword",
+        "script": {
+          "source": "emit(doc['some_field'].value + '_' + doc['another_field'].value)" <2>
+        }
       }
     }
   }
@@ -438,18 +432,16 @@ PUT _ml/anomaly_detectors/test3
   },
   "data_description": {
     "time_field":"@timestamp"
-  }
-}
-
-PUT _ml/datafeeds/datafeed-test3
-{
-  "job_id": "test3",
-  "indices": ["my-index-000001"],
-  "runtime_mappings": {
-    "my_coordinates": {
-      "type": "keyword",
-      "script": {
-        "source": "emit(doc['coords.lat'].value + ',' + doc['coords.lon'].value)"
+  },
+  "datafeed_config":{
+    "datafeed_id": "datafeed-test3",
+    "indices": ["my-index-000001"],
+    "runtime_mappings": {
+      "my_coordinates": {
+        "type": "keyword",
+        "script": {
+          "source": "emit(doc['coords.lat'].value + ',' + doc['coords.lon'].value)"
+        }
       }
     }
   }
@@ -501,19 +493,17 @@ PUT _ml/anomaly_detectors/test4
   },
   "data_description": {
     "time_field":"@timestamp"
-  }
-}
-
-PUT _ml/datafeeds/datafeed-test4
-{
-  "job_id": "test4",
-  "indices": ["my-index-000001"],
-  "script_fields":{
-    "sub":{
-      "script":"return domainSplit(doc['query'].value).get(0);"
-    },
-    "hrd":{
-      "script":"return domainSplit(doc['query'].value).get(1);"
+  },
+  "datafeed_config":{
+    "datafeed_id": "datafeed-test4",
+    "indices": ["my-index-000001"],
+    "script_fields":{
+      "sub":{
+        "script":"return domainSplit(doc['query'].value).get(0);"
+      },
+      "hrd":{
+        "script":"return domainSplit(doc['query'].value).get(1);"
+      }
     }
   }
 }