4 years ago · d36f24fbc3
--- a/docs/reference/ml/anomaly-detection/apis/close-job.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/close-job.asciidoc
@@ -7,10 +7,6 @@
 
				 ++++
			
 
				 
			
 
				 Closes one or more {anomaly-jobs}.
			
 
				-A job can be opened and closed multiple times throughout its lifecycle.
			
 
				-
			
 
				-A closed job cannot receive data or perform analysis
			
 
				-operations, but you can still explore and navigate results.
			
 
				 
			
 
				 [[ml-close-job-request]]
			
 
				 == {api-request-title}
			
@@ -30,13 +26,19 @@ operations, but you can still explore and navigate results.
 
				 [[ml-close-job-desc]]
			
 
				 == {api-description-title}
			
 
				 
			
 
				+A job can be opened and closed multiple times throughout its lifecycle.
			
 
				+
			
 
				+A closed job cannot receive data or perform analysis operations, but you can
			
 
				+still explore and navigate results.
			
 
				+
			
 
				 You can close multiple {anomaly-jobs} in a single API request by using a group
			
 
				 name, a comma-separated list of jobs, or a wildcard expression. You can close
			
 
				 all jobs by using `_all` or by specifying `*` as the `<job_id>`.
			
 
				 
			
 
				-If you close an {anomaly-job} whose {dfeed} is running, the request will first
			
 
				-attempt to stop the {dfeed}, as though <<ml-stop-datafeed>> was called with
			
 
				-the same `timeout` and `force` parameters as the close request.
			
 
				+If you close an {anomaly-job} whose {dfeed} is running, the request first tries
			
 
				+to stop the {dfeed}. This behavior is equivalent to calling
			
 
				+<<ml-stop-datafeed,stop {dfeed}>> with the same `timeout` and `force` parameters
			
 
				+as the close job request.
			
 
				 
			
 
				 When you close a job, it runs housekeeping tasks such as pruning the model history,
			
 
				 flushing buffers, calculating final results and persisting the model snapshots.
			
--- a/docs/reference/ml/anomaly-detection/apis/delete-job.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/delete-job.asciidoc
@@ -34,9 +34,10 @@ are granted to anyone over the `.ml-*` indices.
 
				 It is not currently possible to delete multiple jobs using wildcards or a comma
			
 
				 separated list.
			
 
				 
			
 
				-If you delete a job that has a {dfeed}, the request will first attempt to
			
 
				-delete the {dfeed}, as though <<ml-delete-datafeed>> was called with the same
			
 
				-`timeout` and `force` parameters as this delete request.
			
 
				+If you delete a job that has a {dfeed}, the request first tries to delete the
			
 
				+{dfeed}. This behavior is equivalent to calling
			
 
				+<<ml-delete-datafeed,delete {dfeed}>> with the same `timeout` and `force`
			
 
				+parameters as the delete job request.
			
 
				 
			
 
				 [[ml-delete-job-path-parms]]
			
 
				 == {api-path-parms-title}
			
--- a/docs/reference/ml/anomaly-detection/apis/put-datafeed.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/put-datafeed.asciidoc
@@ -36,10 +36,10 @@ each interval. See {ml-docs}/ml-delayed-data-detection.html[Handling delayed dat
 
				 
			
 
				 [IMPORTANT]
			
 
				 ====
			
 
				-* You must use {kib} or this API to create a {dfeed}. Do not add a
			
 
				-{dfeed} directly to the `.ml-config` index using the {es} index API. If {es}
			
 
				-{security-features} are enabled, do not give users `write` privileges on the
			
 
				-`.ml-config` index.
			
 
				+* You must use {kib}, this API, or the <<ml-put-job,create {anomaly-jobs} API>>
			
 
				+to create a {dfeed}. Do not add a {dfeed} directly to the `.ml-config` index
			
 
				+using the {es} index API. If {es} {security-features} are enabled, do not give
			
 
				+users `write` privileges on the `.ml-config` index.
			
 
				 * When {es} {security-features} are enabled, your {dfeed} remembers which roles
			
 
				 the user who created it had at the time of creation and runs the query using
			
 
				 those same roles. If you provide
			
--- a/docs/reference/ml/anomaly-detection/apis/put-job.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/put-job.asciidoc
@@ -19,13 +19,24 @@ Instantiates an {anomaly-job}.
 
				 Requires the `manage_ml` cluster privilege. This privilege is included in the
			
 
				 `machine_learning_admin` built-in role.
			
 
				 
			
 
				+If you include a `datafeed_config`, you must also have `read` index privileges
			
 
				+on the source index.
			
 
				+
			
 
				 [[ml-put-job-desc]]
			
 
				 == {api-description-title}
			
 
				 
			
 
				-IMPORTANT: You must use {kib} or this API to create an {anomaly-job}. Do not put
			
 
				+[IMPORTANT]
			
 
				+====
			
 
				+* You must use {kib} or this API to create an {anomaly-job}. Do not put
			
 
				 a job directly to the `.ml-config` index using the {es} index API. If {es}
			
 
				 {security-features} are enabled, do not give users `write` privileges on the
			
 
				 `.ml-config` index.
			
 
				+* If you include a `datafeed_config` and {es} {security-features} are enabled,
			
 
				+your {dfeed} remembers which roles the user who created it had at the time of
			
 
				+creation and runs the query using those same roles. If you provide
			
 
				+<<http-clients-secondary-authorization,secondary authorization headers>>, those
			
 
				+credentials are used instead.
			
 
				+====
			
 
				 
			
 
				 [[ml-put-job-path-parms]]
			
 
				 == {api-path-parms-title}
			
@@ -250,7 +261,9 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=data-description]
 
				 //End data_description
			
 
				 
			
 
				 `datafeed_config`::
			
 
				-(object) The {dfeed} configured for the current {anomaly-job}.
			
 
				+(Optional, object) The {ml-docs}/ml-dfeeds.html[{dfeed}], which retrieves data
			
 
				+from {es} for analysis by the job. You can associate only one {dfeed} with each
			
 
				+{anomaly-job}.
			
 
				 +
			
 
				 .Properties of `datafeed`
			
 
				 [%collapsible%open]
			
--- a/docs/reference/ml/anomaly-detection/functions/ml-geo-functions.asciidoc
+++ b/docs/reference/ml/anomaly-detection/functions/ml-geo-functions.asciidoc
@@ -71,8 +71,8 @@ For example, JSON data might contain the following transaction coordinates:
 
				 
			
 
				 In {es}, location data is likely to be stored in `geo_point` fields. For more
			
 
				 information, see {ref}/geo-point.html[`geo_point` data type]. This data type is
			
 
				-supported natively in {ml-features}. Specifically, {dfeed} when pulling data from
			
 
				-a `geo_point` field, will transform the data into the appropriate `lat,lon` string
			
 
				-format before sending to the {anomaly-job}.
			
 
				+supported natively in {ml-features}. Specifically, when pulling data from a
			
 
				+`geo_point` field, a {dfeed} will transform the data into the appropriate
			
 
				+`lat,lon` string format before sending to the {anomaly-job}.
			
 
				 
			
 
				 For more information, see <<ml-configuring-transform>>.
			
--- a/docs/reference/ml/anomaly-detection/ml-configuring-aggregations.asciidoc
+++ b/docs/reference/ml/anomaly-detection/ml-configuring-aggregations.asciidoc
@@ -78,48 +78,30 @@ PUT _ml/anomaly_detectors/farequote
 
				   },
			
 
				   "data_description": {
			
 
				     "time_field":"time"  <1>
			
 
				-  }
			
 
				-}
			
 
				-----------------------------------
			
 
				-// TEST[skip:setup:farequote_data]
			
 
				-
			
 
				-<1> The `airline`, `responsetime`, and `time` fields are aggregations. Only the
			
 
				-aggregated fields defined in the `analysis_config` object are analyzed by the
			
 
				-{anomaly-job}.
			
 
				-
			
 
				-NOTE: When the `summary_count_field_name` property is set to a non-null value,
			
 
				-the job expects to receive aggregated input. The property must be set to the
			
 
				-name of the field that contains the count of raw data points that have been
			
 
				-aggregated. It applies to all detectors in the job.
			
 
				-
			
 
				-The aggregations are defined in the {dfeed} as follows:
			
 
				-
			
 
				-[source,console]
			
 
				-----------------------------------
			
 
				-PUT _ml/datafeeds/datafeed-farequote
			
 
				-{
			
 
				-  "job_id":"farequote",
			
 
				-  "indices": ["farequote"],
			
 
				-  "aggregations": {
			
 
				-    "buckets": {
			
 
				-      "date_histogram": {
			
 
				-        "field": "time",
			
 
				-        "fixed_interval": "360s",
			
 
				-        "time_zone": "UTC"
			
 
				-      },
			
 
				-      "aggregations": {
			
 
				-        "time": {  <1>
			
 
				-          "max": {"field": "time"}
			
 
				+  }, 
			
 
				+  "datafeed_config":{
			
 
				+    "indices": ["farequote"],
			
 
				+    "aggregations": {
			
 
				+      "buckets": {
			
 
				+        "date_histogram": {
			
 
				+          "field": "time",
			
 
				+          "fixed_interval": "360s",
			
 
				+          "time_zone": "UTC"
			
 
				         },
			
 
				-        "airline": {  <2>
			
 
				-          "terms": {
			
 
				-            "field": "airline",
			
 
				-            "size": 100
			
 
				+        "aggregations": {
			
 
				+          "time": {  <2>
			
 
				+            "max": {"field": "time"}
			
 
				           },
			
 
				-          "aggregations": {
			
 
				-            "responsetime": {  <3>
			
 
				-              "avg": {
			
 
				-                "field": "responsetime"
			
 
				+          "airline": {  <3>
			
 
				+            "terms": {
			
 
				+             "field": "airline",
			
 
				+              "size": 100
			
 
				+            },
			
 
				+            "aggregations": {
			
 
				+              "responsetime": {  <4>
			
 
				+                "avg": {
			
 
				+                  "field": "responsetime"
			
 
				+                }
			
 
				               }
			
 
				             }
			
 
				           }
			
@@ -129,19 +111,27 @@ PUT _ml/datafeeds/datafeed-farequote
 
				   }
			
 
				 }
			
 
				 ----------------------------------
			
 
				-// TEST[skip:setup:farequote_job]
			
 
				+// TEST[skip:setup:farequote_data]
			
 
				 
			
 
				-<1> The aggregations have names that match the fields that they operate on. The
			
 
				+<1> The `airline`, `responsetime`, and `time` fields are aggregations. Only the
			
 
				+aggregated fields defined in the `analysis_config` object are analyzed by the
			
 
				+{anomaly-job}.
			
 
				+<2> The aggregations have names that match the fields that they operate on. The
			
 
				 `max` aggregation is named `time` and its field also needs to be `time`.
			
 
				-<2> The `term` aggregation is named `airline` and its field is also named
			
 
				+<3> The `term` aggregation is named `airline` and its field is also named
			
 
				 `airline`.
			
 
				-<3> The `avg` aggregation is named `responsetime` and its field is also named
			
 
				+<4> The `avg` aggregation is named `responsetime` and its field is also named
			
 
				 `responsetime`.
			
 
				 
			
 
				+When the `summary_count_field_name` property is set to a non-null value, the job
			
 
				+expects to receive aggregated input. The property must be set to the name of the
			
 
				+field that contains the count of raw data points that have been aggregated. It
			
 
				+applies to all detectors in the job.
			
 
				+
			
 
				 TIP: If you are using a `term` aggregation to gather influencer or partition
			
 
				 field information, consider using a `composite` aggregation. It performs
			
 
				-better than a `date_histogram` with a nested `term` aggregation and also includes
			
 
				-all the values of the field instead of the top values per bucket.
			
 
				+better than a `date_histogram` with a nested `term` aggregation and also
			
 
				+includes all the values of the field instead of the top values per bucket.
			
 
				 
			
 
				 [discrete]
			
 
				 [[aggs-using-composite]]
			
@@ -153,15 +143,17 @@ For `composite` aggregation support, there must be exactly one `date_histogram`
 
				 source. That value source must not be sorted in descending order. Additional
			
 
				 `composite` aggregation value sources are allowed, such as `terms`.
			
 
				 
			
 
				-NOTE: A {dfeed} that uses composite aggregations may not be as performant as datafeeds that use scrolling or
			
 
				-date histogram aggregations. Composite aggregations are optimized
			
 
				-for queries that are either `match_all` or `range` filters. Other types of
			
 
				+NOTE: A {dfeed} that uses composite aggregations may not be as performant as
			
 
				+{dfeeds} that use scrolling or date histogram aggregations. Composite
			
 
				+aggregations are optimized for queries that are either `match_all` or `range`
			
 
				+filters. Other types of
			
 
				 queries may cause the `composite` aggregation to be ineffecient.
			
 
				 
			
 
				 Here is an example that uses a `composite` aggregation instead of a
			
 
				 `date_histogram`.
			
 
				 
			
 
				-Assuming the same job configuration as above.
			
 
				+This is an example of a job with a {dfeed} that uses a `composite` aggregation
			
 
				+to bucket the metrics based on time and terms:
			
 
				 
			
 
				 [source,console]
			
 
				 ----------------------------------
			
@@ -178,54 +170,42 @@ PUT _ml/anomaly_detectors/farequote-composite
 
				   },
			
 
				   "data_description": {
			
 
				     "time_field":"time"
			
 
				-  }
			
 
				-}
			
 
				-----------------------------------
			
 
				-// TEST[skip:setup:farequote_data]
			
 
				-
			
 
				-This is an example of a datafeed that uses a `composite` aggregation to bucket
			
 
				-the metrics based on time and terms:
			
 
				-
			
 
				-[source,console]
			
 
				-----------------------------------
			
 
				-PUT _ml/datafeeds/datafeed-farequote-composite
			
 
				-{
			
 
				-  "job_id": "farequote-composite",
			
 
				-  "indices": [
			
 
				-    "farequote"
			
 
				-  ],
			
 
				-  "aggregations": {
			
 
				-    "buckets": {
			
 
				-      "composite": {
			
 
				-        "size": 1000,  <1>
			
 
				-        "sources": [
			
 
				-          {
			
 
				-            "time_bucket": {  <2>
			
 
				-              "date_histogram": {
			
 
				-                "field": "time",
			
 
				-                "fixed_interval": "360s",
			
 
				-                "time_zone": "UTC"
			
 
				+  },
			
 
				+  "datafeed_config":{
			
 
				+    "indices": ["farequote"],
			
 
				+    "aggregations": {
			
 
				+      "buckets": {
			
 
				+        "composite": {
			
 
				+          "size": 1000,  <1>
			
 
				+          "sources": [
			
 
				+            {
			
 
				+              "time_bucket": {  <2>
			
 
				+                "date_histogram": {
			
 
				+                  "field": "time",
			
 
				+                  "fixed_interval": "360s",
			
 
				+                  "time_zone": "UTC"
			
 
				+                }
			
 
				               }
			
 
				-            }
			
 
				-          },
			
 
				-          {
			
 
				-            "airline": {  <3>
			
 
				-              "terms": {
			
 
				-                "field": "airline"
			
 
				+            },
			
 
				+            {
			
 
				+              "airline": {  <3>
			
 
				+                "terms": {
			
 
				+                  "field": "airline"
			
 
				+                }
			
 
				               }
			
 
				             }
			
 
				-          }
			
 
				-        ]
			
 
				-      },
			
 
				-      "aggregations": {
			
 
				-        "time": {  <4>
			
 
				-          "max": {
			
 
				-            "field": "time"
			
 
				-          }
			
 
				+          ]
			
 
				         },
			
 
				-        "responsetime": { <5>
			
 
				-          "avg": {
			
 
				-            "field": "responsetime"
			
 
				+        "aggregations": {
			
 
				+          "time": {  <4>
			
 
				+            "max": {
			
 
				+              "field": "time"
			
 
				+            }
			
 
				+          },
			
 
				+          "responsetime": { <5>
			
 
				+            "avg": {
			
 
				+              "field": "responsetime"
			
 
				+            }
			
 
				           }
			
 
				         }
			
 
				       }
			
@@ -233,10 +213,8 @@ PUT _ml/datafeeds/datafeed-farequote-composite
 
				   }
			
 
				 }
			
 
				 ----------------------------------
			
 
				-// TEST[skip:setup:farequote_job]
			
 
				-
			
 
				 <1> Provide the `size` to the composite agg to control how many resources
			
 
				-are used when aggregating the data. A larger `size` means a faster datafeed but
			
 
				+are used when aggregating the data. A larger `size` means a faster {dfeed} but
			
 
				 more cluster resources are used when searching.
			
 
				 <2> The required `date_histogram` composite aggregation source. Make sure it
			
 
				 is named differently than your desired time field.
			
@@ -364,7 +342,7 @@ When using a `date_histogram` aggregation to bucket by time:
 
				     "bucket_agg": {
			
 
				       ...
			
 
				     },
			
 
				-    "aggregations": {]
			
 
				+    "aggregations": {
			
 
				       "data_histogram_aggregation": {
			
 
				         "date_histogram": {
			
 
				           "field": "time",
			
--- a/docs/reference/ml/anomaly-detection/ml-configuring-transform.asciidoc
+++ b/docs/reference/ml/anomaly-detection/ml-configuring-transform.asciidoc
@@ -107,20 +107,16 @@ PUT _ml/anomaly_detectors/test1
 
				   },
			
 
				   "data_description": {
			
 
				     "time_field":"@timestamp"
			
 
				-  }
			
 
				-}
			
 
				-
			
 
				-PUT _ml/datafeeds/datafeed-test1
			
 
				-{
			
 
				-  "job_id": "test1",
			
 
				-  "indices": [
			
 
				-    "my-index-000001"
			
 
				-  ],
			
 
				-  "runtime_mappings": {
			
 
				-    "total_error_count": { <2>
			
 
				-      "type": "long",
			
 
				-      "script": {
			
 
				-        "source": "emit(doc['error_count'].value + doc['aborted_count'].value)"
			
 
				+  },
			
 
				+  "datafeed_config":{
			
 
				+    "datafeed_id": "datafeed-test1",
			
 
				+    "indices": ["my-index-000001"],
			
 
				+    "runtime_mappings": {
			
 
				+      "total_error_count": { <2>
			
 
				+        "type": "long",
			
 
				+        "script": {
			
 
				+          "source": "emit(doc['error_count'].value + doc['aborted_count'].value)"
			
 
				+        }
			
 
				       }
			
 
				     }
			
 
				   }
			
@@ -193,18 +189,16 @@ PUT _ml/anomaly_detectors/test2
 
				   },
			
 
				   "data_description": {
			
 
				     "time_field":"@timestamp"
			
 
				-  }
			
 
				-}
			
 
				-
			
 
				-PUT _ml/datafeeds/datafeed-test2
			
 
				-{
			
 
				-  "job_id": "test2",
			
 
				-  "indices": ["my-index-000001"],
			
 
				-  "runtime_mappings": {
			
 
				-    "my_runtime_field": {
			
 
				-      "type": "keyword",
			
 
				-      "script": {
			
 
				-        "source": "emit(doc['some_field'].value + '_' + doc['another_field'].value)" <2>
			
 
				+  },
			
 
				+  "datafeed_config":{
			
 
				+    "datafeed_id": "datafeed-test2",
			
 
				+    "indices": ["my-index-000001"],
			
 
				+    "runtime_mappings": {
			
 
				+      "my_runtime_field": {
			
 
				+        "type": "keyword",
			
 
				+        "script": {
			
 
				+          "source": "emit(doc['some_field'].value + '_' + doc['another_field'].value)" <2>
			
 
				+        }
			
 
				       }
			
 
				     }
			
 
				   }
			
@@ -438,18 +432,16 @@ PUT _ml/anomaly_detectors/test3
 
				   },
			
 
				   "data_description": {
			
 
				     "time_field":"@timestamp"
			
 
				-  }
			
 
				-}
			
 
				-
			
 
				-PUT _ml/datafeeds/datafeed-test3
			
 
				-{
			
 
				-  "job_id": "test3",
			
 
				-  "indices": ["my-index-000001"],
			
 
				-  "runtime_mappings": {
			
 
				-    "my_coordinates": {
			
 
				-      "type": "keyword",
			
 
				-      "script": {
			
 
				-        "source": "emit(doc['coords.lat'].value + ',' + doc['coords.lon'].value)"
			
 
				+  },
			
 
				+  "datafeed_config":{
			
 
				+    "datafeed_id": "datafeed-test3",
			
 
				+    "indices": ["my-index-000001"],
			
 
				+    "runtime_mappings": {
			
 
				+      "my_coordinates": {
			
 
				+        "type": "keyword",
			
 
				+        "script": {
			
 
				+          "source": "emit(doc['coords.lat'].value + ',' + doc['coords.lon'].value)"
			
 
				+        }
			
 
				       }
			
 
				     }
			
 
				   }
			
@@ -501,19 +493,17 @@ PUT _ml/anomaly_detectors/test4
 
				   },
			
 
				   "data_description": {
			
 
				     "time_field":"@timestamp"
			
 
				-  }
			
 
				-}
			
 
				-
			
 
				-PUT _ml/datafeeds/datafeed-test4
			
 
				-{
			
 
				-  "job_id": "test4",
			
 
				-  "indices": ["my-index-000001"],
			
 
				-  "script_fields":{
			
 
				-    "sub":{
			
 
				-      "script":"return domainSplit(doc['query'].value).get(0);"
			
 
				-    },
			
 
				-    "hrd":{
			
 
				-      "script":"return domainSplit(doc['query'].value).get(1);"
			
 
				+  },
			
 
				+  "datafeed_config":{
			
 
				+    "datafeed_id": "datafeed-test4",
			
 
				+    "indices": ["my-index-000001"],
			
 
				+    "script_fields":{
			
 
				+      "sub":{
			
 
				+        "script":"return domainSplit(doc['query'].value).get(0);"
			
 
				+      },
			
 
				+      "hrd":{
			
 
				+        "script":"return domainSplit(doc['query'].value).get(1);"
			
 
				+      }
			
 
				     }
			
 
				   }
			
 
				 }