5 년 전 · 3d96e6b68e
--- a/docs/reference/ml/anomaly-detection/apis/datafeedresource.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/datafeedresource.asciidoc
@@ -1,161 +0,0 @@
 
				-[role="xpack"]
			
 
				-[testenv="platinum"]
			
 
				-[[ml-datafeed-resource]]
			
 
				-=== {dfeed-cap} resources
			
 
				-
			
 
				-A {dfeed} resource has the following properties:
			
 
				-
			
 
				-`aggregations`::
			
 
				-  (object) If set, the {dfeed} performs aggregation searches.
			
 
				-  Support for aggregations is limited and should only be used with
			
 
				-  low cardinality data. For more information, see
			
 
				-  {stack-ov}/ml-configuring-aggregation.html[Aggregating Data for Faster Performance].
			
 
				-
			
 
				-`chunking_config`::
			
 
				-  (object) Specifies how data searches are split into time chunks.
			
 
				-  See <<ml-datafeed-chunking-config>>.
			
 
				-  For example: `{"mode": "manual", "time_span": "3h"}`
			
 
				-
			
 
				-`datafeed_id`::
			
 
				- (string) A numerical character string that uniquely identifies the {dfeed}.
			
 
				- This property is informational; you cannot change the identifier for existing
			
 
				- {dfeeds}.
			
 
				-
			
 
				-`frequency`::
			
 
				-  (time units) The interval at which scheduled queries are made while the
			
 
				-  {dfeed} runs in real time. The default value is either the bucket span for short
			
 
				-  bucket spans, or, for longer bucket spans, a sensible fraction of the bucket
			
 
				-  span. For example: `150s`.
			
 
				-
			
 
				-`indices`::
			
 
				-  (array) An array of index names. For example: `["it_ops_metrics"]`
			
 
				-
			
 
				-`job_id`::
			
 
				- (string) The unique identifier for the job to which the {dfeed} sends data.
			
 
				-
			
 
				-`query`::
			
 
				-  (object) The {es} query domain-specific language (DSL). This value
			
 
				-  corresponds to the query object in an {es} search POST body. All the
			
 
				-  options that are supported by {es} can be used, as this object is
			
 
				-  passed verbatim to {es}. By default, this property has the following
			
 
				-  value: `{"match_all": {"boost": 1}}`.
			
 
				-
			
 
				-`query_delay`::
			
 
				-  (time units) The number of seconds behind real time that data is queried. For
			
 
				-  example, if data from 10:04 a.m. might not be searchable in {es} until
			
 
				-  10:06 a.m., set this property to 120 seconds. The default value is randomly
			
 
				-  selected between `60s` and `120s`. This randomness improves the query
			
 
				-  performance when there are multiple jobs running on the same node.
			
 
				-
			
 
				-`script_fields`::
			
 
				-  (object) Specifies scripts that evaluate custom expressions and returns
			
 
				-  script fields to the {dfeed}.
			
 
				-  The detector configuration objects in a job can contain
			
 
				-  functions that use these script fields.
			
 
				-  For more information, see
			
 
				-  {stack-ov}/ml-configuring-transform.html[Transforming Data With Script Fields].
			
 
				-
			
 
				-`scroll_size`::
			
 
				-  (unsigned integer) The `size` parameter that is used in {es} searches.
			
 
				-  The default value is `1000`.
			
 
				-
			
 
				-`delayed_data_check_config`::
			
 
				-  (object) Specifies whether the data feed checks for missing data and 
			
 
				-  the size of the window. For example:
			
 
				-  `{"enabled": true, "check_window": "1h"}` See
			
 
				-  <<ml-datafeed-delayed-data-check-config>>.
			
 
				-
			
 
				-`max_empty_searches`::
			
 
				-  (integer) If a real-time {dfeed} has never seen any data (including during
			
 
				-  any initial training period) then it will automatically stop itself and
			
 
				-  close its associated job after this many real-time searches that return no
			
 
				-  documents. In other words, it will stop after `frequency` times
			
 
				-  `max_empty_searches` of real-time operation. If not set
			
 
				-  then a {dfeed} with no end time that sees no data will remain started until
			
 
				-  it is explicitly stopped. By default this setting is not set.
			
 
				-
			
 
				-[[ml-datafeed-chunking-config]]
			
 
				-==== Chunking configuration objects
			
 
				-
			
 
				-{dfeeds-cap} might be required to search over long time periods, for several months
			
 
				-or years. This search is split into time chunks in order to ensure the load
			
 
				-on {es} is managed. Chunking configuration controls how the size of these time
			
 
				-chunks are calculated and is an advanced configuration option.
			
 
				-
			
 
				-A chunking configuration object has the following properties:
			
 
				-
			
 
				-`mode`::
			
 
				-  There are three available modes: +
			
 
				-  `auto`::: The chunk size will be dynamically calculated. This is the default
			
 
				-  and recommended value.
			
 
				-  `manual`::: Chunking will be applied according to the specified `time_span`.
			
 
				-  `off`::: No chunking will be applied.
			
 
				-
			
 
				-`time_span`::
			
 
				-  (time units) The time span that each search will be querying.
			
 
				-  This setting is only applicable when the mode is set to `manual`.
			
 
				-  For example: `3h`.
			
 
				-
			
 
				-[[ml-datafeed-delayed-data-check-config]]
			
 
				-==== Delayed data check configuration objects
			
 
				-
			
 
				-The {dfeed} can optionally search over indices that have already been read in
			
 
				-an effort to determine whether any data has subsequently been added to the index.
			
 
				-If missing data is found, it is a good indication that the `query_delay` option
			
 
				-is set too low and the data is being indexed after the {dfeed} has passed that
			
 
				-moment in time. See 
			
 
				-{stack-ov}/ml-delayed-data-detection.html[Working with delayed data].
			
 
				-
			
 
				-This check runs only on real-time {dfeeds}.
			
 
				-
			
 
				-The configuration object has the following properties:
			
 
				-
			
 
				-`enabled`::
			
 
				-  (boolean) Specifies whether the {dfeed} periodically checks for delayed data.
			
 
				-  Defaults to `true`.
			
 
				-
			
 
				-`check_window`::
			
 
				-  (time units) The window of time that is searched for late data. This window of
			
 
				-  time ends with the latest finalized bucket. It defaults to `null`, which
			
 
				-  causes an appropriate `check_window` to be calculated when the real-time
			
 
				-  {dfeed} runs. In particular, the default `check_window` span calculation is
			
 
				-  based on the maximum of `2h` or `8 * bucket_span`.
			
 
				-
			
 
				-[float]
			
 
				-[[ml-datafeed-counts]]
			
 
				-==== {dfeed-cap} counts
			
 
				-
			
 
				-The get {dfeed} statistics API provides information about the operational
			
 
				-progress of a {dfeed}. All of these properties are informational; you cannot
			
 
				-update their values:
			
 
				-
			
 
				-`assignment_explanation`::
			
 
				-  (string) For started {dfeeds} only, contains messages relating to the
			
 
				-  selection of a node.
			
 
				-
			
 
				-`datafeed_id`::
			
 
				- (string) A numerical character string that uniquely identifies the {dfeed}.
			
 
				-
			
 
				-`node`::
			
 
				-  (object) The node upon which the {dfeed} is started. The {dfeed} and job will
			
 
				-  be on the same node.
			
 
				-  `id`::: The unique identifier of the node. For example,
			
 
				-  "0-o0tOoRTwKFZifatTWKNw".
			
 
				-  `name`::: The node name. For example, `0-o0tOo`.
			
 
				-  `ephemeral_id`::: The node ephemeral ID.
			
 
				-  `transport_address`::: The host and port where transport HTTP connections are
			
 
				-  accepted. For example, `127.0.0.1:9300`.
			
 
				-  `attributes`::: For example, `{"ml.machine_memory": "17179869184"}`.
			
 
				-
			
 
				-`state`::
			
 
				-  (string) The status of the {dfeed}, which can be one of the following values: +
			
 
				-  `started`::: The {dfeed} is actively receiving data.
			
 
				-  `stopped`::: The {dfeed} is stopped and will not receive data until it is
			
 
				-  re-started.
			
 
				-
			
 
				-`timing_stats`::
			
 
				-  (object) An object that provides statistical information about timing aspect of this datafeed. +
			
 
				-  `job_id`::: A numerical character string that uniquely identifies the job.
			
 
				-  `search_count`::: Number of searches performed by this datafeed.
			
 
				-  `total_search_time_ms`::: Total time the datafeed spent searching in milliseconds.
			
 
				-
			
--- a/docs/reference/ml/anomaly-detection/apis/delete-datafeed.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/delete-datafeed.asciidoc
@@ -28,14 +28,15 @@ can delete it.
 
				 ==== {api-path-parms-title}
			
 
				 
			
 
				 `<feed_id>`::
			
 
				-  (Required, string) Identifier for the {dfeed}.
			
 
				+(Required, string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
			
 
				 
			
 
				 [[ml-delete-datafeed-query-parms]]
			
 
				 ==== {api-query-parms-title}
			
 
				 
			
 
				 `force`::
			
 
				-  (Optional, boolean) Use to forcefully delete a started {dfeed}; this method is
			
 
				-  quicker than stopping and deleting the {dfeed}.
			
 
				+(Optional, boolean) Use to forcefully delete a started {dfeed}; this method is
			
 
				+quicker than stopping and deleting the {dfeed}.
			
 
				 
			
 
				 [[ml-delete-datafeed-example]]
			
 
				 ==== {api-examples-title}
			
--- a/docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc
@@ -45,36 +45,61 @@ IMPORTANT: This API returns a maximum of 10,000 {dfeeds}.
 
				 ==== {api-path-parms-title}
			
 
				 
			
 
				 `<feed_id>`::
			
 
				-  (Optional, string) Identifier for the {dfeed}. It can be a {dfeed} identifier
			
 
				-  or a wildcard expression. If you do not specify one of these options, the API
			
 
				-  returns statistics for all {dfeeds}.
			
 
				+(Optional, string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id-wildcard]
			
 
				+If you do not specify one of these options, the API returns information about
			
 
				+all {dfeeds}.
			
 
				 
			
 
				 [[ml-get-datafeed-stats-query-parms]]
			
 
				 ==== {api-query-parms-title}
			
 
				 
			
 
				 `allow_no_datafeeds`::
			
 
				-  (Optional, boolean) Specifies what to do when the request:
			
 
				-+
			
 
				---
			
 
				-* Contains wildcard expressions and there are no {datafeeds} that match.
			
 
				-* Contains the `_all` string or no identifiers and there are no matches.
			
 
				-* Contains wildcard expressions and there are only partial matches. 
			
 
				-
			
 
				-The default value is `true`, which returns an empty `datafeeds` array when
			
 
				-there are no matches and the subset of results when there are partial matches.
			
 
				-If this parameter is `false`, the request returns a `404` status code when there
			
 
				-are no matches or only partial matches.
			
 
				---
			
 
				+(Optional, boolean)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-datafeeds]
			
 
				 
			
 
				 
			
 
				 [[ml-get-datafeed-stats-results]]
			
 
				 ==== {api-response-body-title}
			
 
				 
			
 
				-The API returns the following information:
			
 
				+The API returns an array of {dfeed} count objects. All of these properties are
			
 
				+informational; you cannot update their values.
			
 
				+
			
 
				+`assignment_explanation`::
			
 
				+(string) For started {dfeeds} only, contains messages relating to the selection of a node.
			
 
				+
			
 
				+`datafeed_id`::
			
 
				+(string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
			
 
				+
			
 
				+`node`::
			
 
				+(object) For started {dfeeds} only, the node upon which the {dfeed} is started. The {dfeed} and job will be on the same node.
			
 
				+`id`::: The unique identifier of the node. For example, "0-o0tOoRTwKFZifatTWKNw".
			
 
				+`name`::: The node name. For example, `0-o0tOo`.
			
 
				+`ephemeral_id`::: The node ephemeral ID.
			
 
				+`transport_address`::: The host and port where transport HTTP connections are
			
 
				+accepted. For example, `127.0.0.1:9300`.
			
 
				+`attributes`::: For example, `{"ml.machine_memory": "17179869184"}`.
			
 
				+
			
 
				+`state`::
			
 
				+(string) The status of the {dfeed}, which can be one of the following values:
			
 
				++
			
 
				+--
			
 
				+* `started`: The {dfeed} is actively receiving data.
			
 
				+* `stopped`: The {dfeed} is stopped and will not receive data until it is
			
 
				+re-started.
			
 
				+--
			
 
				+
			
 
				+`timing_stats`::
			
 
				+(object) An object that provides statistical information about timing aspect of
			
 
				+this {dfeed}.
			
 
				+//average_search_time_per_bucket_ms
			
 
				+//bucket_count
			
 
				+//exponential_average_search_time_per_hour_ms
			
 
				+`job_id`:::
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
			
 
				+`search_count`::: Number of searches performed by this {dfeed}.
			
 
				+`total_search_time_ms`::: Total time the {dfeed} spent searching in milliseconds.
			
 
				 
			
 
				-`datafeeds`::
			
 
				-  (array) An array of {dfeed} count objects.
			
 
				-  For more information, see <<ml-datafeed-counts>>.
			
 
				 
			
 
				 [[ml-get-datafeed-stats-response-codes]]
			
 
				 ==== {api-response-codes-title}
			
@@ -86,46 +111,46 @@ The API returns the following information:
 
				 [[ml-get-datafeed-stats-example]]
			
 
				 ==== {api-examples-title}
			
 
				 
			
 
				-The following example gets usage information for the
			
 
				-`datafeed-total-requests` {dfeed}:
			
 
				-
			
 
				 [source,console]
			
 
				 --------------------------------------------------
			
 
				-GET _ml/datafeeds/datafeed-total-requests/_stats
			
 
				+GET _ml/datafeeds/datafeed-high_sum_total_sales/_stats
			
 
				 --------------------------------------------------
			
 
				-// TEST[skip:setup:server_metrics_startdf]
			
 
				+// TEST[skip:Kibana sample data started datafeed]
			
 
				 
			
 
				 The API returns the following results:
			
 
				 
			
 
				 [source,console-result]
			
 
				 ----
			
 
				 {
			
 
				-  "count": 1,
			
 
				-  "datafeeds": [
			
 
				+  "count" : 1,
			
 
				+  "datafeeds" : [
			
 
				     {
			
 
				-      "datafeed_id": "datafeed-total-requests",
			
 
				-      "state": "started",
			
 
				-      "node": {
			
 
				-        "id": "2spCyo1pRi2Ajo-j-_dnPX",
			
 
				-        "name": "node-0",
			
 
				-        "ephemeral_id": "hoXMLZB0RWKfR9UPPUCxXX",
			
 
				-        "transport_address": "127.0.0.1:9300",
			
 
				-        "attributes": {
			
 
				-          "ml.machine_memory": "17179869184",
			
 
				-          "ml.max_open_jobs": "20"
			
 
				+      "datafeed_id" : "datafeed-high_sum_total_sales",
			
 
				+      "state" : "started",
			
 
				+      "node" : {
			
 
				+        "id" : "7bmMXyWCRs-TuPfGJJ_yMw",
			
 
				+        "name" : "node-0",
			
 
				+        "ephemeral_id" : "hoXMLZB0RWKfR9UPPUCxXX",
			
 
				+        "transport_address" : "127.0.0.1:9300",
			
 
				+        "attributes" : {
			
 
				+          "ml.machine_memory" : "17179869184",
			
 
				+          "ml.max_open_jobs" : "20"
			
 
				         }
			
 
				       },
			
 
				-      "assignment_explanation": "",
			
 
				-      "timing_stats": {
			
 
				-        "job_id": "job-total-requests",
			
 
				-        "search_count": 20,
			
 
				-        "total_search_time_ms": 120.5
			
 
				+      "assignment_explanation" : "",
			
 
				+      "timing_stats" : {
			
 
				+        "job_id" : "high_sum_total_sales",
			
 
				+        "search_count" : 7,
			
 
				+        "bucket_count" : 743,
			
 
				+        "total_search_time_ms" : 134.0,
			
 
				+        "average_search_time_per_bucket_ms" : 0.180349932705249,
			
 
				+        "exponential_average_search_time_per_hour_ms" : 11.514712961628677
			
 
				       }
			
 
				     }
			
 
				   ]
			
 
				 }
			
 
				 ----
			
 
				-// TESTRESPONSE[s/"2spCyo1pRi2Ajo-j-_dnPX"/$body.$_path/]
			
 
				+// TESTRESPONSE[s/"7bmMXyWCRs-TuPfGJJ_yMw"/$body.$_path/]
			
 
				 // TESTRESPONSE[s/"node-0"/$body.$_path/]
			
 
				 // TESTRESPONSE[s/"hoXMLZB0RWKfR9UPPUCxXX"/$body.$_path/]
			
 
				 // TESTRESPONSE[s/"127.0.0.1:9300"/$body.$_path/]
			
--- a/docs/reference/ml/anomaly-detection/apis/get-datafeed.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/get-datafeed.asciidoc
@@ -42,35 +42,71 @@ IMPORTANT: This API returns a maximum of 10,000 {dfeeds}.
 
				 ==== {api-path-parms-title}
			
 
				 
			
 
				 `<feed_id>`::
			
 
				-  (Optional, string) Identifier for the {dfeed}. It can be a {dfeed} identifier
			
 
				-  or a wildcard expression. If you do not specify one of these options, the API
			
 
				-  returns information about all {dfeeds}.
			
 
				+(Optional, string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id-wildcard]
			
 
				+If you do not specify one of these options, the API returns information about
			
 
				+all {dfeeds}.
			
 
				 
			
 
				 [[ml-get-datafeed-query-parms]]
			
 
				 ==== {api-query-parms-title}
			
 
				 
			
 
				 `allow_no_datafeeds`::
			
 
				-  (Optional, boolean) Specifies what to do when the request:
			
 
				-+
			
 
				---
			
 
				-* Contains wildcard expressions and there are no {datafeeds} that match.
			
 
				-* Contains the `_all` string or no identifiers and there are no matches.
			
 
				-* Contains wildcard expressions and there are only partial matches. 
			
 
				-
			
 
				-The default value is `true`, which returns an empty `datafeeds` array when
			
 
				-there are no matches and the subset of results when there are partial matches.
			
 
				-If this parameter is `false`, the request returns a `404` status code when there
			
 
				-are no matches or only partial matches.
			
 
				---
			
 
				+(Optional, boolean)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-datafeeds]
			
 
				 
			
 
				 [[ml-get-datafeed-results]]
			
 
				 ==== {api-response-body-title}
			
 
				 
			
 
				-The API returns the following information:
			
 
				+The API returns an array of {dfeed} resources, which have the following
			
 
				+properties:
			
 
				 
			
 
				-`datafeeds`::
			
 
				-  (array) An array of {dfeed} objects.
			
 
				-  For more information, see <<ml-datafeed-resource>>.
			
 
				+`aggregations`::
			
 
				+(object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=aggregations]
			
 
				+
			
 
				+`chunking_config`::
			
 
				+(object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=chunking-config]
			
 
				+
			
 
				+`datafeed_id`::
			
 
				+(string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
			
 
				+
			
 
				+`delayed_data_check_config`::
			
 
				+(object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=delayed-data-check-config]
			
 
				+
			
 
				+`frequency`::
			
 
				+(<<time-units,time units>>)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=frequency]
			
 
				+
			
 
				+`indices`::
			
 
				+(array)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=indices]
			
 
				+
			
 
				+`job_id`::
			
 
				+(string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-datafeed]
			
 
				+
			
 
				+`max_empty_searches`::
			
 
				+(integer)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=max-empty-searches]
			
 
				+
			
 
				+`query`::
			
 
				+(object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=query]
			
 
				+
			
 
				+`query_delay`::
			
 
				+(<<time-units,time units>>)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=query-delay]
			
 
				+
			
 
				+`script_fields`::
			
 
				+(object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=script-fields]
			
 
				+
			
 
				+`scroll_size`::
			
 
				+(unsigned integer)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=scroll-size]
			
 
				 
			
 
				 [[ml-get-datafeed-response-codes]]
			
 
				 ==== {api-response-codes-title}
			
@@ -83,39 +119,48 @@ The API returns the following information:
 
				 ==== {api-examples-title}
			
 
				 
			
 
				 The following example gets configuration information for the
			
 
				-`datafeed-total-requests` {dfeed}:
			
 
				+`datafeed-high_sum_total_sales` {dfeed}:
			
 
				 
			
 
				 [source,console]
			
 
				 --------------------------------------------------
			
 
				-GET _ml/datafeeds/datafeed-total-requests
			
 
				+GET _ml/datafeeds/datafeed-high_sum_total_sales
			
 
				 --------------------------------------------------
			
 
				-// TEST[skip:setup:server_metrics_datafeed]
			
 
				+// TEST[skip:kibana sample data]
			
 
				 
			
 
				 The API returns the following results:
			
 
				 
			
 
				 [source,console-result]
			
 
				 ----
			
 
				 {
			
 
				-  "count": 1,
			
 
				-  "datafeeds": [
			
 
				+  "count" : 1,
			
 
				+  "datafeeds" : [
			
 
				     {
			
 
				-      "datafeed_id": "datafeed-total-requests",
			
 
				-      "job_id": "total-requests",
			
 
				-      "query_delay": "83474ms",
			
 
				-      "indices": [
			
 
				-        "server-metrics"
			
 
				+      "datafeed_id" : "datafeed-high_sum_total_sales",
			
 
				+      "job_id" : "high_sum_total_sales",
			
 
				+      "query_delay" : "93169ms",
			
 
				+      "indices" : [
			
 
				+        "kibana_sample_data_ecommerce"
			
 
				       ],
			
 
				-      "query": {
			
 
				-        "match_all": {
			
 
				-          "boost": 1.0
			
 
				+      "query" : {
			
 
				+        "bool" : {
			
 
				+          "filter" : [
			
 
				+            {
			
 
				+              "term" : {
			
 
				+                "_index" : "kibana_sample_data_ecommerce"
			
 
				+              }
			
 
				+            }
			
 
				+          ]
			
 
				         }
			
 
				       },
			
 
				-      "scroll_size": 1000,
			
 
				-      "chunking_config": {
			
 
				-        "mode": "auto"
			
 
				+      "scroll_size" : 1000,
			
 
				+      "chunking_config" : {
			
 
				+        "mode" : "auto"
			
 
				+      },
			
 
				+      "delayed_data_check_config" : {
			
 
				+        "enabled" : true
			
 
				       }
			
 
				     }
			
 
				   ]
			
 
				 }
			
 
				 ----
			
 
				-// TESTRESPONSE[s/"query.boost": "1.0"/"query.boost": $body.query.boost/]
			
 
				+// TESTRESPONSE[s/"query.boost": "93169ms"/"query.boost": $body.query.boost/]
			
--- a/docs/reference/ml/anomaly-detection/apis/preview-datafeed.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/preview-datafeed.asciidoc
@@ -41,18 +41,17 @@ it to ensure it is returning the expected data.
 
				 ==== {api-path-parms-title}
			
 
				 
			
 
				 `<datafeed_id>`::
			
 
				-  (Required, string) Identifier for the {dfeed}.
			
 
				+(Required, string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
			
 
				 
			
 
				 [[ml-preview-datafeed-example]]
			
 
				 ==== {api-examples-title}
			
 
				 
			
 
				-The following example obtains a preview of the `datafeed-farequote` {dfeed}:
			
 
				-
			
 
				 [source,console]
			
 
				 --------------------------------------------------
			
 
				-GET _ml/datafeeds/datafeed-farequote/_preview
			
 
				+GET _ml/datafeeds/datafeed-high_sum_total_sales/_preview
			
 
				 --------------------------------------------------
			
 
				-// TEST[skip:setup:farequote_datafeed]
			
 
				+// TEST[skip:set up Kibana sample data]
			
 
				 
			
 
				 The data that is returned for this example is as follows:
			
 
				 
			
@@ -60,22 +59,28 @@ The data that is returned for this example is as follows:
 
				 ----
			
 
				 [
			
 
				   {
			
 
				-    "time": 1454803200000,
			
 
				-    "airline": "JZA",
			
 
				-    "doc_count": 5,
			
 
				-    "responsetime": 990.4628295898438
			
 
				+    "order_date" : 1574294659000,
			
 
				+    "category.keyword" : "Men's Clothing",
			
 
				+    "customer_full_name.keyword" : "Sultan Al Benson",
			
 
				+    "taxful_total_price" : 35.96875
			
 
				   },
			
 
				   {
			
 
				-    "time": 1454803200000,
			
 
				-    "airline": "JBU",
			
 
				-    "doc_count": 23,
			
 
				-    "responsetime": 877.5927124023438
			
 
				+    "order_date" : 1574294918000,
			
 
				+    "category.keyword" : [
			
 
				+      "Women's Accessories",
			
 
				+      "Women's Clothing"
			
 
				+    ],
			
 
				+    "customer_full_name.keyword" : "Pia Webb",
			
 
				+    "taxful_total_price" : 83.0
			
 
				   },
			
 
				   {
			
 
				-    "time": 1454803200000,
			
 
				-    "airline": "KLM",
			
 
				-    "doc_count": 42,
			
 
				-    "responsetime": 1355.481201171875
			
 
				+    "order_date" : 1574295782000,
			
 
				+    "category.keyword" : [
			
 
				+      "Women's Accessories",
			
 
				+      "Women's Shoes"
			
 
				+    ],
			
 
				+    "customer_full_name.keyword" : "Brigitte Graham",
			
 
				+    "taxful_total_price" : 72.0
			
 
				   }
			
 
				 ]
			
 
				 ----
			
--- a/docs/reference/ml/anomaly-detection/apis/put-datafeed.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/put-datafeed.asciidoc
@@ -43,70 +43,52 @@ those same roles.
 
				 ==== {api-path-parms-title}
			
 
				 
			
 
				 `<feed_id>`::
			
 
				-  (Required, string) A numerical character string that uniquely identifies the
			
 
				-  {dfeed}. This identifier can contain lowercase alphanumeric characters (a-z
			
 
				-  and 0-9), hyphens, and underscores. It must start and end with alphanumeric
			
 
				-  characters.
			
 
				+(Required, string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
			
 
				 
			
 
				 [[ml-put-datafeed-request-body]]
			
 
				 ==== {api-request-body-title}
			
 
				 
			
 
				 `aggregations`::
			
 
				-  (Optional, object) If set, the {dfeed} performs aggregation searches. For more
			
 
				-  information, see <<ml-datafeed-resource>>.
			
 
				+(Optional, object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=aggregations]
			
 
				 
			
 
				 `chunking_config`::
			
 
				-  (Optional, object) Specifies how data searches are split into time chunks. See
			
 
				-  <<ml-datafeed-chunking-config>>.
			
 
				+(Optional, object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=chunking-config]
			
 
				 
			
 
				 `delayed_data_check_config`::
			
 
				-  (Optional, object) Specifies whether the data feed checks for missing data and 
			
 
				-  the size of the window. See <<ml-datafeed-delayed-data-check-config>>.
			
 
				+(Optional, object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=delayed-data-check-config]
			
 
				 
			
 
				 `frequency`::
			
 
				-  (Optional, <<time-units, time units>>) The interval at which scheduled queries 
			
 
				-  are made while the {dfeed} runs in real time. The default value is either the 
			
 
				-  bucket span for short bucket spans, or, for longer bucket spans, a sensible 
			
 
				-  fraction of the bucket span. For example: `150s`.
			
 
				+(Optional, <<time-units, time units>>)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=frequency]
			
 
				 
			
 
				 `indices`::
			
 
				-  (Required, array) An array of index names. Wildcards are supported. For
			
 
				-  example: `["it_ops_metrics", "server*"]`.
			
 
				-+
			
 
				---
			
 
				-NOTE: If any indices are in remote clusters then `cluster.remote.connect` must
			
 
				-not be set to `false` on any ML node.
			
 
				---
			
 
				+(Required, array)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=indices]
			
 
				 
			
 
				-`job_id`::
			
 
				- (Required, string) A numerical character string that uniquely identifies the
			
 
				- {anomaly-job}.
			
 
				 
			
 
				+`job_id`::
			
 
				+(Required, string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
			
 
				+ 
			
 
				 `query`::
			
 
				-  (Optional, object) The {es} query domain-specific language (DSL). This value
			
 
				-  corresponds to the query object in an {es} search POST body. All the options
			
 
				-  that are supported by {Es} can be used, as this object is passed verbatim to
			
 
				-  {es}. By default, this property has the following value:
			
 
				-  `{"match_all": {"boost": 1}}`.
			
 
				+(Optional, object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=query]
			
 
				 
			
 
				 `query_delay`::
			
 
				-  (Optional, <<time-units, time units>>) The number of seconds behind real time 
			
 
				-  that data is queried. For example, if data from 10:04 a.m. might not be 
			
 
				-  searchable in {es} until 10:06 a.m., set this property to 120 seconds. The 
			
 
				-  default value is `60s`.
			
 
				+(Optional, <<time-units, time units>>)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=query-delay]
			
 
				 
			
 
				 `script_fields`::
			
 
				-  (Optional, object) Specifies scripts that evaluate custom expressions and
			
 
				-  returns script fields to the {dfeed}. The detector configuration objects in a
			
 
				-  job can contain functions that use these script fields. For more information,
			
 
				-  see <<request-body-search-script-fields,Script fields>>.
			
 
				+(Optional, object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=script-fields]
			
 
				 
			
 
				 `scroll_size`::
			
 
				-  (Optional, unsigned integer) The `size` parameter that is used in {es}
			
 
				-  searches. The default value is `1000`.
			
 
				-
			
 
				-For more information about these properties,
			
 
				-see <<ml-datafeed-resource>>.
			
 
				+(Optional, unsigned integer)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=scroll-size]
			
 
				 
			
 
				 [[ml-put-datafeed-example]]
			
 
				 ==== {api-examples-title}
			
--- a/docs/reference/ml/anomaly-detection/apis/start-datafeed.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/start-datafeed.asciidoc
@@ -74,7 +74,8 @@ creation/update and runs the query using those same roles.
 
				 ==== {api-path-parms-title}
			
 
				 
			
 
				 `<feed_id>`::
			
 
				-  (Required, string) Identifier for the {dfeed}.
			
 
				+(Required, string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
			
 
				 
			
 
				 [[ml-start-datafeed-request-body]]
			
 
				 ==== {api-request-body-title}
			
--- a/docs/reference/ml/anomaly-detection/apis/stop-datafeed.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/stop-datafeed.asciidoc
@@ -40,25 +40,15 @@ comma-separated list of {dfeeds} or a wildcard expression. You can close all
 
				 ==== {api-path-parms-title}
			
 
				 
			
 
				 `<feed_id>`::
			
 
				-  (Required, string) Identifier for the {dfeed}. It can be a {dfeed} identifier
			
 
				-  or a wildcard expression.
			
 
				+(Required, string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id-wildcard]
			
 
				 
			
 
				 [[ml-stop-datafeed-query-parms]]
			
 
				 ==== {api-query-parms-title}
			
 
				 
			
 
				 `allow_no_datafeeds`::
			
 
				-  (Optional, boolean) Specifies what to do when the request:
			
 
				-+
			
 
				---
			
 
				-* Contains wildcard expressions and there are no {datafeeds} that match.
			
 
				-* Contains the `_all` string or no identifiers and there are no matches.
			
 
				-* Contains wildcard expressions and there are only partial matches. 
			
 
				-
			
 
				-The default value is `true`, which returns an empty `datafeeds` array when
			
 
				-there are no matches and the subset of results when there are partial matches.
			
 
				-If this parameter is `false`, the request returns a `404` status code when there
			
 
				-are no matches or only partial matches.
			
 
				---
			
 
				+(Optional, boolean)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-datafeeds]
			
 
				 
			
 
				 [[ml-stop-datafeed-request-body]]
			
 
				 ==== {api-request-body-title}
			
--- a/docs/reference/ml/anomaly-detection/apis/update-datafeed.asciidoc
+++ b/docs/reference/ml/anomaly-detection/apis/update-datafeed.asciidoc
@@ -39,7 +39,8 @@ using those same roles.
 
				 ==== {api-path-parms-title}
			
 
				 
			
 
				 `<feed_id>`::
			
 
				-  (Required, string) Identifier for the {dfeed}.
			
 
				+(Required, string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
			
 
				 
			
 
				 [[ml-update-datafeed-request-body]]
			
 
				 ==== {api-request-body-title}
			
@@ -47,70 +48,58 @@ using those same roles.
 
				 The following properties can be updated after the {dfeed} is created:
			
 
				 
			
 
				 `aggregations`::
			
 
				-  (Optional, object) If set, the {dfeed} performs aggregation searches. For more
			
 
				-  information, see <<ml-datafeed-resource>>.
			
 
				+(Optional, object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=aggregations]
			
 
				 
			
 
				 `chunking_config`::
			
 
				-  (Optional, object) Specifies how data searches are split into time chunks. See
			
 
				-  <<ml-datafeed-chunking-config>>.
			
 
				-  
			
 
				+(Optional, object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=chunking-config]
			
 
				+
			
 
				 `delayed_data_check_config`::
			
 
				-  (Optional, object) Specifies whether the data feed checks for missing data and 
			
 
				-  the size of the window. See <<ml-datafeed-delayed-data-check-config>>.  
			
 
				+(Optional, object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=delayed-data-check-config]
			
 
				 
			
 
				 `frequency`::
			
 
				-  (Optional, <<time-units, time units>>) The interval at which scheduled queries 
			
 
				-  are made while the {dfeed} runs in real time. The default value is either the 
			
 
				-  bucket span for short bucket spans, or, for longer bucket spans, a sensible 
			
 
				-  fraction of the bucket span. For example: `150s`.
			
 
				+(Optional, <<time-units, time units>>)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=frequency]
			
 
				 
			
 
				 `indices`::
			
 
				-  (Optional, array) An array of index names. Wildcards are supported. For
			
 
				-  example: `["it_ops_metrics", "server*"]`.
			
 
				+(Optional, array)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=indices]
			
 
				+
			
 
				+`max_empty_searches`::
			
 
				+(Optional, integer)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=max-empty-searches]
			
 
				++
			
 
				+--
			
 
				+The special value `-1` unsets this setting.
			
 
				+--
			
 
				 
			
 
				 `query`::
			
 
				-  (Optional, object) The {es} query domain-specific language (DSL). This value
			
 
				-  corresponds to the query object in an {es} search POST body. All the options
			
 
				-  that are supported by {es} can be used, as this object is passed verbatim to
			
 
				-  {es}. By default, this property has the following value:
			
 
				-  `{"match_all": {"boost": 1}}`.
			
 
				+(Optional, object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=query]
			
 
				 +
			
 
				 --
			
 
				-WARNING: If you change the query, then the analyzed data will also be changed, 
			
 
				-therefore the required time to learn might be long and the understandability of 
			
 
				-the results is unpredictable.
			
 
				-If you want to make significant changes to the source data, we would recommend 
			
 
				-you clone it and create a second job containing the amendments. Let both run in 
			
 
				-parallel and close one when you are satisfied with the results of the other job.
			
 
				+WARNING: If you change the query, the analyzed data is also changed. Therefore,
			
 
				+the required time to learn might be long and the understandability of the
			
 
				+results is unpredictable. If you want to make significant changes to the source
			
 
				+data, we would recommend you clone it and create a second job containing the
			
 
				+amendments. Let both run in parallel and close one when you are satisfied with
			
 
				+the results of the other job.
			
 
				+
			
 
				 --
			
 
				 
			
 
				 `query_delay`::
			
 
				-  (Optional, <<time-units, time units>>) The number of seconds behind real-time 
			
 
				-  that data is queried. For example, if data from 10:04 a.m. might not be 
			
 
				-  searchable in {es} until 10:06 a.m., set this property to 120 seconds. The 
			
 
				-  default value is `60s`.
			
 
				+(Optional, <<time-units, time units>>)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=query-delay]
			
 
				 
			
 
				 `script_fields`::
			
 
				-  (Optional, object) Specifies scripts that evaluate custom expressions and
			
 
				-  returns script fields to the {dfeed}. The detector configuration objects in a
			
 
				-  job can contain functions that use these script fields. For more information,
			
 
				-  see <<request-body-search-script-fields,Script fields>>.
			
 
				+(Optional, object)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=script-fields]
			
 
				 
			
 
				 `scroll_size`::
			
 
				-  (Optional, unsigned integer) The `size` parameter that is used in {es}
			
 
				-  searches. The default value is `1000`.
			
 
				-
			
 
				-`max_empty_searches`::
			
 
				-  (Optional, integer) If a real-time {dfeed} has never seen any data (including
			
 
				-  during any initial training period) then it will automatically stop itself
			
 
				-  and close its associated job after this many real-time searches that return
			
 
				-  no documents. In other words, it will stop after `frequency` times
			
 
				-  `max_empty_searches` of real-time operation. If not set
			
 
				-  then a {dfeed} with no end time that sees no data will remain started until
			
 
				-  it is explicitly stopped. The special value `-1` unsets this setting.
			
 
				-
			
 
				-For more information about these properties, see <<ml-datafeed-resource>>.
			
 
				-
			
 
				+(Optional, unsigned integer)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=scroll-size]
			
 
				 
			
 
				 [[ml-update-datafeed-example]]
			
 
				 ==== {api-examples-title}
			
--- a/docs/reference/ml/anomaly-detection/delayed-data-detection.asciidoc
+++ b/docs/reference/ml/anomaly-detection/delayed-data-detection.asciidoc
@@ -5,14 +5,15 @@
 
				 Delayed data are documents that are indexed late. That is to say, it is data
			
 
				 related to a time that the {dfeed} has already processed.
			
 
				 
			
 
				-When you create a datafeed, you can specify a
			
 
				-{ref}/ml-datafeed-resource.html[`query_delay`] setting. This setting enables the
			
 
				-datafeed to wait for some time past real-time, which means any "late" data in
			
 
				-this period is fully indexed before the datafeed tries to gather it. However, if
			
 
				-the setting is set too low, the datafeed may query for data before it has been
			
 
				-indexed and consequently miss that document. Conversely, if it is set too high,
			
 
				-analysis drifts farther away from real-time. The balance that is struck depends
			
 
				-upon each use case and the environmental factors of the cluster.
			
 
				+When you create a {dfeed}, you can specify a
			
 
				+{ref}/ml-put-datafeed.html#ml-put-datafeed-request-body[`query_delay`] setting.
			
 
				+This setting enables the {dfeed} to wait for some time past real-time, which
			
 
				+means any "late" data in this period is fully indexed before the {dfeed} tries
			
 
				+to gather it. However, if the setting is set too low, the {dfeed} may query for
			
 
				+data before it has been indexed and consequently miss that document. Conversely,
			
 
				+if it is set too high, analysis drifts farther away from real-time. The balance
			
 
				+that is struck depends upon each use case and the environmental factors of the
			
 
				+cluster.
			
 
				 
			
 
				 ==== Why worry about delayed data?
			
 
				 
			
@@ -28,8 +29,7 @@ recorded so that you can determine a next course of action.
 
				 
			
 
				 ==== How do we detect delayed data?
			
 
				 
			
 
				-In addition to the `query_delay` field, there is a
			
 
				-{ref}/ml-datafeed-resource.html#ml-datafeed-delayed-data-check-config[delayed data check config],
			
 
				+In addition to the `query_delay` field, there is a delayed data check config,
			
 
				 which enables you to configure the datafeed to look in the past for delayed data.
			
 
				 Every 15 minutes or every `check_window`, whichever is smaller, the datafeed
			
 
				 triggers a document search over the configured indices. This search looks over a
			
--- a/docs/reference/ml/ml-shared.asciidoc
+++ b/docs/reference/ml/ml-shared.asciidoc
@@ -1,3 +1,10 @@
 
				+tag::aggregations[]
			
 
				+If set, the {dfeed} performs aggregation searches. Support for aggregations is
			
 
				+limited and should only be used with low cardinality data. For more information,
			
 
				+see
			
 
				+{stack-ov}/ml-configuring-aggregation.html[Aggregating data for faster performance].
			
 
				+end::aggregations[]
			
 
				+
			
 
				 tag::allow-lazy-open[]
			
 
				 Advanced configuration option. Specifies whether this job can open when there is 
			
 
				 insufficient {ml} node capacity for it to be immediately assigned to a node. The
			
@@ -9,6 +16,21 @@ return an error and the job waits in the `opening` state until sufficient {ml}
 
				 node capacity is available.
			
 
				 end::allow-lazy-open[]
			
 
				 
			
 
				+tag::allow-no-datafeeds[]
			
 
				+Specifies what to do when the request:
			
 
				++
			
 
				+--
			
 
				+* Contains wildcard expressions and there are no {dfeeds} that match.
			
 
				+* Contains the `_all` string or no identifiers and there are no matches.
			
 
				+* Contains wildcard expressions and there are only partial matches. 
			
 
				+
			
 
				+The default value is `true`, which returns an empty `datafeeds` array when
			
 
				+there are no matches and the subset of results when there are partial matches.
			
 
				+If this parameter is `false`, the request returns a `404` status code when there
			
 
				+are no matches or only partial matches.
			
 
				+--
			
 
				+end::allow-no-datafeeds[]
			
 
				+
			
 
				 tag::allow-no-jobs[]
			
 
				 Specifies what to do when the request:
			
 
				 +
			
@@ -207,6 +229,22 @@ add them here as
 
				 <<analysis-pattern-replace-charfilter,pattern replace character filters>>.
			
 
				 end::char-filter[]
			
 
				 
			
 
				+tag::chunking-config[]
			
 
				+{dfeeds-cap} might be required to search over long time periods, for several months
			
 
				+or years. This search is split into time chunks in order to ensure the load
			
 
				+on {es} is managed. Chunking configuration controls how the size of these time
			
 
				+chunks are calculated and is an advanced configuration option.
			
 
				+A chunking configuration object has the following properties:
			
 
				+
			
 
				+`mode`:::
			
 
				+(string)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=mode]
			
 
				+
			
 
				+`time_span`:::
			
 
				+(<<time-units,time units>>)
			
 
				+include::{docdir}/ml/ml-shared.asciidoc[tag=time-span]
			
 
				+end::chunking-config[]
			
 
				+
			
 
				 tag::custom-rules[]
			
 
				 An array of custom rule objects, which enable you to customize the way detectors
			
 
				 operate. For example, a rule may dictate to the detector conditions under which
			
@@ -301,6 +339,47 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=time-format]
 
				 --
			
 
				 end::data-description[]
			
 
				 
			
 
				+tag::datafeed-id[]
			
 
				+A numerical character string that uniquely identifies the
			
 
				+{dfeed}. This identifier can contain lowercase alphanumeric characters (a-z
			
 
				+and 0-9), hyphens, and underscores. It must start and end with alphanumeric
			
 
				+characters.
			
 
				+end::datafeed-id[]
			
 
				+
			
 
				+tag::datafeed-id-wildcard[]
			
 
				+Identifier for the {dfeed}. It can be a {dfeed} identifier or a wildcard
			
 
				+expression.
			
 
				+end::datafeed-id-wildcard[]
			
 
				+
			
 
				+tag::delayed-data-check-config[]
			
 
				+Specifies whether the {dfeed} checks for missing data and the size of the
			
 
				+window. For example: `{"enabled": true, "check_window": "1h"}`.
			
 
				++
			
 
				+--
			
 
				+The {dfeed} can optionally search over indices that have already been read in
			
 
				+an effort to determine whether any data has subsequently been added to the index.
			
 
				+If missing data is found, it is a good indication that the `query_delay` option
			
 
				+is set too low and the data is being indexed after the {dfeed} has passed that
			
 
				+moment in time. See 
			
 
				+{stack-ov}/ml-delayed-data-detection.html[Working with delayed data].
			
 
				+
			
 
				+This check runs only on real-time {dfeeds}.
			
 
				+
			
 
				+The configuration object has the following properties:
			
 
				+
			
 
				+`enabled`::
			
 
				+(boolean) Specifies whether the {dfeed} periodically checks for delayed data.
			
 
				+Defaults to `true`.
			
 
				+
			
 
				+`check_window`::
			
 
				+(<<time-units,time units>>) The window of time that is searched for late data.
			
 
				+This window of time ends with the latest finalized bucket. It defaults to
			
 
				+`null`, which causes an appropriate `check_window` to be calculated when the
			
 
				+real-time {dfeed} runs. In particular, the default `check_window` span
			
 
				+calculation is based on the maximum of `2h` or `8 * bucket_span`.
			
 
				+--
			
 
				+end::delayed-data-check-config[]
			
 
				+
			
 
				 tag::dependent_variable[]
			
 
				 `dependent_variable`::
			
 
				 (Required, string) Defines which field of the document is to be predicted. 
			
@@ -405,6 +484,13 @@ optional. If it is not specified, no token filters are applied prior to
 
				 categorization.
			
 
				 end::filter[]
			
 
				 
			
 
				+tag::frequency[]
			
 
				+The interval at which scheduled queries are made while the {dfeed} runs in real
			
 
				+time. The default value is either the bucket span for short bucket spans, or,
			
 
				+for longer bucket spans, a sensible fraction of the bucket span. For example:
			
 
				+`150s`.
			
 
				+end::frequency[]
			
 
				+
			
 
				 tag::function[]
			
 
				 The analysis function that is used. For example, `count`, `rare`, `mean`, `min`, 
			
 
				 `max`, and `sum`. For more information, see
			
@@ -424,6 +510,17 @@ tag::groups[]
 
				 A list of job groups. A job can belong to no groups or many.
			
 
				 end::groups[]
			
 
				 
			
 
				+tag::indices[]
			
 
				+An array of index names. Wildcards are supported. For example:
			
 
				+`["it_ops_metrics", "server*"]`.
			
 
				++
			
 
				+--
			
 
				+NOTE: If any indices are in remote clusters then `cluster.remote.connect` must
			
 
				+not be set to `false` on any {ml} nodes.
			
 
				+
			
 
				+--
			
 
				+end::indices[]
			
 
				+
			
 
				 tag::influencers[]
			
 
				 A comma separated list of influencer field names. Typically these can be the by, 
			
 
				 over, or partition fields that are used in the detector configuration. You might 
			
@@ -475,6 +572,10 @@ alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start
 
				 and end with alphanumeric characters.
			
 
				 end::job-id-data-frame-analytics-define[]
			
 
				 
			
 
				+tag::job-id-datafeed[]
			
 
				+The unique identifier for the job to which the {dfeed} sends data.
			
 
				+end::job-id-datafeed[]
			
 
				+
			
 
				 tag::jobs-stats-anomaly-detection[]
			
 
				 An array of {anomaly-job} statistics objects.
			
 
				 For more information, see <<ml-jobstats>>.
			
@@ -502,12 +603,32 @@ the <<ml-post-data,post data>> API.
 
				 --
			
 
				 end::latency[]
			
 
				 
			
 
				+tag::max-empty-searches[]
			
 
				+If a real-time {dfeed} has never seen any data (including during any initial
			
 
				+training period) then it will automatically stop itself and close its associated
			
 
				+job after this many real-time searches that return no documents. In other words,
			
 
				+it will stop after `frequency` times `max_empty_searches` of real-time operation.
			
 
				+If not set then a {dfeed} with no end time that sees no data will remain started
			
 
				+until it is explicitly stopped. By default this setting is not set.
			
 
				+end::max-empty-searches[]
			
 
				+
			
 
				 tag::maximum_number_trees[]
			
 
				 `maximum_number_trees`::
			
 
				 (Optional, integer) Defines the maximum number of trees the forest is allowed 
			
 
				 to contain. The maximum value is 2000.
			
 
				 end::maximum_number_trees[]
			
 
				 
			
 
				+tag::mode[]
			
 
				+There are three available modes: 
			
 
				++
			
 
				+--
			
 
				+* `auto`: The chunk size is dynamically calculated. This is the default and
			
 
				+recommended value.
			
 
				+* `manual`: Chunking is applied according to the specified `time_span`.
			
 
				+* `off`: No chunking is applied.
			
 
				+--
			
 
				+end::mode[]
			
 
				+
			
 
				 tag::model-memory-limit[]
			
 
				 The approximate maximum amount of memory resources that are required for 
			
 
				 analytical processing. Once this limit is approached, data pruning becomes
			
@@ -615,6 +736,21 @@ tag::prediction_field_name[]
 
				 Defaults to `<dependent_variable>_prediction`.
			
 
				 end::prediction_field_name[]
			
 
				 
			
 
				+tag::query[]
			
 
				+The {es} query domain-specific language (DSL). This value corresponds to the
			
 
				+query object in an {es} search POST body. All the options that are supported by
			
 
				+{es} can be used, as this object is passed verbatim to {es}. By default, this
			
 
				+property has the following value: `{"match_all": {"boost": 1}}`.
			
 
				+end::query[]
			
 
				+
			
 
				+tag::query-delay[]
			
 
				+The number of seconds behind real time that data is queried. For example, if
			
 
				+data from 10:04 a.m. might not be searchable in {es} until 10:06 a.m., set this
			
 
				+property to 120 seconds. The default value is randomly selected between `60s`
			
 
				+and `120s`. This randomness improves the query performance when there are
			
 
				+multiple jobs running on the same node.
			
 
				+end::query-delay[]
			
 
				+
			
 
				 tag::renormalization-window-days[]
			
 
				 Advanced configuration option. The period over which adjustments to the score
			
 
				 are applied, as new data is seen. The default value is the longer of 30 days or
			
@@ -633,6 +769,18 @@ are deleted from {es}. The default value is null, which means results are
 
				 retained.
			
 
				 end::results-retention-days[]
			
 
				 
			
 
				+tag::script-fields[]
			
 
				+Specifies scripts that evaluate custom expressions and returns script fields to
			
 
				+the {dfeed}. The detector configuration objects in a job can contain functions
			
 
				+that use these script fields. For more information, see
			
 
				+{stack-ov}/ml-configuring-transform.html[Transforming data with script fields]
			
 
				+and <<request-body-search-script-fields,Script fields>>.
			
 
				+end::script-fields[]
			
 
				+
			
 
				+tag::scroll-size[]
			
 
				+The `size` parameter that is used in {es} searches. The default value is `1000`.
			
 
				+end::scroll-size[]
			
 
				+
			
 
				 tag::summary-count-field-name[]
			
 
				 If this property is specified, the data that is fed to the job is expected to be
			
 
				 pre-summarized. This property value is the name of the field that contains the 
			
@@ -663,6 +811,11 @@ job creation fails.
 
				 --
			
 
				 end::time-format[]
			
 
				 
			
 
				+tag::time-span[]
			
 
				+The time span that each search will be querying. This setting is only applicable
			
 
				+when the mode is set to `manual`. For example: `3h`.
			
 
				+end::time-span[]
			
 
				+
			
 
				 tag::tokenizer[]
			
 
				 The name or definition of the <<analysis-tokenizers,tokenizer>> to use after 
			
 
				 character filters are applied. This property is compulsory if 
			
--- a/docs/reference/redirects.asciidoc
+++ b/docs/reference/redirects.asciidoc
@@ -1054,4 +1054,15 @@ This page was deleted.
 
				 [[ml-analysisconfig]]
			
 
				 See the details in
			
 
				 [[ml-apimodelplotconfig]]
			
 
				-<<ml-put-job>>, <<ml-update-job>>, and <<ml-get-job>>.
			
 
				+<<ml-put-job>>, <<ml-update-job>>, and <<ml-get-job>>.
			
 
				+
			
 
				+[role="exclude",id="ml-datafeed-resource"]
			
 
				+=== {dfeed-cap} resources
			
 
				+
			
 
				+This page was deleted.
			
 
				+[[ml-datafeed-chunking-config]]
			
 
				+See the details in <<ml-put-datafeed>>, <<ml-update-datafeed>>, 
			
 
				+[[ml-datafeed-delayed-data-check-config]]
			
 
				+<<ml-get-datafeed>>, 
			
 
				+[[ml-datafeed-counts]]
			
 
				+<<ml-get-datafeed-stats>>.
			
--- a/docs/reference/rest-api/defs.asciidoc
+++ b/docs/reference/rest-api/defs.asciidoc
@@ -5,8 +5,6 @@
 
				 These resource definitions are used in APIs related to {ml-features} and
			
 
				 {security-features} and in {kib} advanced {ml} job configuration options.
			
 
				 
			
 
				-* <<ml-datafeed-resource,{dfeeds-cap}>>
			
 
				-* <<ml-datafeed-counts,{dfeed-cap} counts>>
			
 
				 * <<ml-dfanalytics-resources,{dfanalytics-cap}>>
			
 
				 * <<ml-evaluate-dfanalytics-resources,Evaluate {dfanalytics}>>
			
 
				 * <<ml-jobstats,{anomaly-jobs-cap} statistics>>
			
@@ -15,7 +13,6 @@ These resource definitions are used in APIs related to {ml-features} and
 
				 * <<role-mapping-resources,Role mappings>>
			
 
				 * <<transform-resource,{transforms-cap}>>
			
 
				 
			
 
				-include::{es-repo-dir}/ml/anomaly-detection/apis/datafeedresource.asciidoc[]
			
 
				 include::{es-repo-dir}/ml/df-analytics/apis/dfanalyticsresources.asciidoc[]
			
 
				 include::{es-repo-dir}/ml/df-analytics/apis/evaluateresources.asciidoc[]
			
 
				 include::{es-repo-dir}/ml/anomaly-detection/apis/jobcounts.asciidoc[]