소스 검색

[DOCS] Move datafeed resource definitions into APIs (#50005)

Co-Authored-By: István Zoltán Szabó <istvan.szabo@elastic.co>
Lisa Cawley 5 년 전
부모
커밋
3d96e6b68e

+ 0 - 161
docs/reference/ml/anomaly-detection/apis/datafeedresource.asciidoc

@@ -1,161 +0,0 @@
-[role="xpack"]
-[testenv="platinum"]
-[[ml-datafeed-resource]]
-=== {dfeed-cap} resources
-
-A {dfeed} resource has the following properties:
-
-`aggregations`::
-  (object) If set, the {dfeed} performs aggregation searches.
-  Support for aggregations is limited and should only be used with
-  low cardinality data. For more information, see
-  {stack-ov}/ml-configuring-aggregation.html[Aggregating Data for Faster Performance].
-
-`chunking_config`::
-  (object) Specifies how data searches are split into time chunks.
-  See <<ml-datafeed-chunking-config>>.
-  For example: `{"mode": "manual", "time_span": "3h"}`
-
-`datafeed_id`::
- (string) A numerical character string that uniquely identifies the {dfeed}.
- This property is informational; you cannot change the identifier for existing
- {dfeeds}.
-
-`frequency`::
-  (time units) The interval at which scheduled queries are made while the
-  {dfeed} runs in real time. The default value is either the bucket span for short
-  bucket spans, or, for longer bucket spans, a sensible fraction of the bucket
-  span. For example: `150s`.
-
-`indices`::
-  (array) An array of index names. For example: `["it_ops_metrics"]`
-
-`job_id`::
- (string) The unique identifier for the job to which the {dfeed} sends data.
-
-`query`::
-  (object) The {es} query domain-specific language (DSL). This value
-  corresponds to the query object in an {es} search POST body. All the
-  options that are supported by {es} can be used, as this object is
-  passed verbatim to {es}. By default, this property has the following
-  value: `{"match_all": {"boost": 1}}`.
-
-`query_delay`::
-  (time units) The number of seconds behind real time that data is queried. For
-  example, if data from 10:04 a.m. might not be searchable in {es} until
-  10:06 a.m., set this property to 120 seconds. The default value is randomly
-  selected between `60s` and `120s`. This randomness improves the query
-  performance when there are multiple jobs running on the same node.
-
-`script_fields`::
-  (object) Specifies scripts that evaluate custom expressions and returns
-  script fields to the {dfeed}.
-  The detector configuration objects in a job can contain
-  functions that use these script fields.
-  For more information, see
-  {stack-ov}/ml-configuring-transform.html[Transforming Data With Script Fields].
-
-`scroll_size`::
-  (unsigned integer) The `size` parameter that is used in {es} searches.
-  The default value is `1000`.
-
-`delayed_data_check_config`::
-  (object) Specifies whether the data feed checks for missing data and 
-  the size of the window. For example:
-  `{"enabled": true, "check_window": "1h"}` See
-  <<ml-datafeed-delayed-data-check-config>>.
-
-`max_empty_searches`::
-  (integer) If a real-time {dfeed} has never seen any data (including during
-  any initial training period) then it will automatically stop itself and
-  close its associated job after this many real-time searches that return no
-  documents. In other words, it will stop after `frequency` times
-  `max_empty_searches` of real-time operation. If not set
-  then a {dfeed} with no end time that sees no data will remain started until
-  it is explicitly stopped. By default this setting is not set.
-
-[[ml-datafeed-chunking-config]]
-==== Chunking configuration objects
-
-{dfeeds-cap} might be required to search over long time periods, for several months
-or years. This search is split into time chunks in order to ensure the load
-on {es} is managed. Chunking configuration controls how the size of these time
-chunks are calculated and is an advanced configuration option.
-
-A chunking configuration object has the following properties:
-
-`mode`::
-  There are three available modes: +
-  `auto`::: The chunk size will be dynamically calculated. This is the default
-  and recommended value.
-  `manual`::: Chunking will be applied according to the specified `time_span`.
-  `off`::: No chunking will be applied.
-
-`time_span`::
-  (time units) The time span that each search will be querying.
-  This setting is only applicable when the mode is set to `manual`.
-  For example: `3h`.
-
-[[ml-datafeed-delayed-data-check-config]]
-==== Delayed data check configuration objects
-
-The {dfeed} can optionally search over indices that have already been read in
-an effort to determine whether any data has subsequently been added to the index.
-If missing data is found, it is a good indication that the `query_delay` option
-is set too low and the data is being indexed after the {dfeed} has passed that
-moment in time. See 
-{stack-ov}/ml-delayed-data-detection.html[Working with delayed data].
-
-This check runs only on real-time {dfeeds}.
-
-The configuration object has the following properties:
-
-`enabled`::
-  (boolean) Specifies whether the {dfeed} periodically checks for delayed data.
-  Defaults to `true`.
-
-`check_window`::
-  (time units) The window of time that is searched for late data. This window of
-  time ends with the latest finalized bucket. It defaults to `null`, which
-  causes an appropriate `check_window` to be calculated when the real-time
-  {dfeed} runs. In particular, the default `check_window` span calculation is
-  based on the maximum of `2h` or `8 * bucket_span`.
-
-[float]
-[[ml-datafeed-counts]]
-==== {dfeed-cap} counts
-
-The get {dfeed} statistics API provides information about the operational
-progress of a {dfeed}. All of these properties are informational; you cannot
-update their values:
-
-`assignment_explanation`::
-  (string) For started {dfeeds} only, contains messages relating to the
-  selection of a node.
-
-`datafeed_id`::
- (string) A numerical character string that uniquely identifies the {dfeed}.
-
-`node`::
-  (object) The node upon which the {dfeed} is started. The {dfeed} and job will
-  be on the same node.
-  `id`::: The unique identifier of the node. For example,
-  "0-o0tOoRTwKFZifatTWKNw".
-  `name`::: The node name. For example, `0-o0tOo`.
-  `ephemeral_id`::: The node ephemeral ID.
-  `transport_address`::: The host and port where transport HTTP connections are
-  accepted. For example, `127.0.0.1:9300`.
-  `attributes`::: For example, `{"ml.machine_memory": "17179869184"}`.
-
-`state`::
-  (string) The status of the {dfeed}, which can be one of the following values: +
-  `started`::: The {dfeed} is actively receiving data.
-  `stopped`::: The {dfeed} is stopped and will not receive data until it is
-  re-started.
-
-`timing_stats`::
-  (object) An object that provides statistical information about timing aspect of this datafeed. +
-  `job_id`::: A numerical character string that uniquely identifies the job.
-  `search_count`::: Number of searches performed by this datafeed.
-  `total_search_time_ms`::: Total time the datafeed spent searching in milliseconds.
-

+ 4 - 3
docs/reference/ml/anomaly-detection/apis/delete-datafeed.asciidoc

@@ -28,14 +28,15 @@ can delete it.
 ==== {api-path-parms-title}
 
 `<feed_id>`::
-  (Required, string) Identifier for the {dfeed}.
+(Required, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
 
 [[ml-delete-datafeed-query-parms]]
 ==== {api-query-parms-title}
 
 `force`::
-  (Optional, boolean) Use to forcefully delete a started {dfeed}; this method is
-  quicker than stopping and deleting the {dfeed}.
+(Optional, boolean) Use to forcefully delete a started {dfeed}; this method is
+quicker than stopping and deleting the {dfeed}.
 
 [[ml-delete-datafeed-example]]
 ==== {api-examples-title}

+ 67 - 42
docs/reference/ml/anomaly-detection/apis/get-datafeed-stats.asciidoc

@@ -45,36 +45,61 @@ IMPORTANT: This API returns a maximum of 10,000 {dfeeds}.
 ==== {api-path-parms-title}
 
 `<feed_id>`::
-  (Optional, string) Identifier for the {dfeed}. It can be a {dfeed} identifier
-  or a wildcard expression. If you do not specify one of these options, the API
-  returns statistics for all {dfeeds}.
+(Optional, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id-wildcard]
+If you do not specify one of these options, the API returns information about
+all {dfeeds}.
 
 [[ml-get-datafeed-stats-query-parms]]
 ==== {api-query-parms-title}
 
 `allow_no_datafeeds`::
-  (Optional, boolean) Specifies what to do when the request:
-+
---
-* Contains wildcard expressions and there are no {datafeeds} that match.
-* Contains the `_all` string or no identifiers and there are no matches.
-* Contains wildcard expressions and there are only partial matches. 
-
-The default value is `true`, which returns an empty `datafeeds` array when
-there are no matches and the subset of results when there are partial matches.
-If this parameter is `false`, the request returns a `404` status code when there
-are no matches or only partial matches.
---
+(Optional, boolean)
+include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-datafeeds]
 
 
 [[ml-get-datafeed-stats-results]]
 ==== {api-response-body-title}
 
-The API returns the following information:
+The API returns an array of {dfeed} count objects. All of these properties are
+informational; you cannot update their values.
+
+`assignment_explanation`::
+(string) For started {dfeeds} only, contains messages relating to the selection of a node.
+
+`datafeed_id`::
+(string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
+
+`node`::
+(object) For started {dfeeds} only, the node upon which the {dfeed} is started. The {dfeed} and job will be on the same node.
+`id`::: The unique identifier of the node. For example, "0-o0tOoRTwKFZifatTWKNw".
+`name`::: The node name. For example, `0-o0tOo`.
+`ephemeral_id`::: The node ephemeral ID.
+`transport_address`::: The host and port where transport HTTP connections are
+accepted. For example, `127.0.0.1:9300`.
+`attributes`::: For example, `{"ml.machine_memory": "17179869184"}`.
+
+`state`::
+(string) The status of the {dfeed}, which can be one of the following values:
++
+--
+* `started`: The {dfeed} is actively receiving data.
+* `stopped`: The {dfeed} is stopped and will not receive data until it is
+re-started.
+--
+
+`timing_stats`::
+(object) An object that provides statistical information about timing aspect of
+this {dfeed}.
+//average_search_time_per_bucket_ms
+//bucket_count
+//exponential_average_search_time_per_hour_ms
+`job_id`:::
+include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
+`search_count`::: Number of searches performed by this {dfeed}.
+`total_search_time_ms`::: Total time the {dfeed} spent searching in milliseconds.
 
-`datafeeds`::
-  (array) An array of {dfeed} count objects.
-  For more information, see <<ml-datafeed-counts>>.
 
 [[ml-get-datafeed-stats-response-codes]]
 ==== {api-response-codes-title}
@@ -86,46 +111,46 @@ The API returns the following information:
 [[ml-get-datafeed-stats-example]]
 ==== {api-examples-title}
 
-The following example gets usage information for the
-`datafeed-total-requests` {dfeed}:
-
 [source,console]
 --------------------------------------------------
-GET _ml/datafeeds/datafeed-total-requests/_stats
+GET _ml/datafeeds/datafeed-high_sum_total_sales/_stats
 --------------------------------------------------
-// TEST[skip:setup:server_metrics_startdf]
+// TEST[skip:Kibana sample data started datafeed]
 
 The API returns the following results:
 
 [source,console-result]
 ----
 {
-  "count": 1,
-  "datafeeds": [
+  "count" : 1,
+  "datafeeds" : [
     {
-      "datafeed_id": "datafeed-total-requests",
-      "state": "started",
-      "node": {
-        "id": "2spCyo1pRi2Ajo-j-_dnPX",
-        "name": "node-0",
-        "ephemeral_id": "hoXMLZB0RWKfR9UPPUCxXX",
-        "transport_address": "127.0.0.1:9300",
-        "attributes": {
-          "ml.machine_memory": "17179869184",
-          "ml.max_open_jobs": "20"
+      "datafeed_id" : "datafeed-high_sum_total_sales",
+      "state" : "started",
+      "node" : {
+        "id" : "7bmMXyWCRs-TuPfGJJ_yMw",
+        "name" : "node-0",
+        "ephemeral_id" : "hoXMLZB0RWKfR9UPPUCxXX",
+        "transport_address" : "127.0.0.1:9300",
+        "attributes" : {
+          "ml.machine_memory" : "17179869184",
+          "ml.max_open_jobs" : "20"
         }
       },
-      "assignment_explanation": "",
-      "timing_stats": {
-        "job_id": "job-total-requests",
-        "search_count": 20,
-        "total_search_time_ms": 120.5
+      "assignment_explanation" : "",
+      "timing_stats" : {
+        "job_id" : "high_sum_total_sales",
+        "search_count" : 7,
+        "bucket_count" : 743,
+        "total_search_time_ms" : 134.0,
+        "average_search_time_per_bucket_ms" : 0.180349932705249,
+        "exponential_average_search_time_per_hour_ms" : 11.514712961628677
       }
     }
   ]
 }
 ----
-// TESTRESPONSE[s/"2spCyo1pRi2Ajo-j-_dnPX"/$body.$_path/]
+// TESTRESPONSE[s/"7bmMXyWCRs-TuPfGJJ_yMw"/$body.$_path/]
 // TESTRESPONSE[s/"node-0"/$body.$_path/]
 // TESTRESPONSE[s/"hoXMLZB0RWKfR9UPPUCxXX"/$body.$_path/]
 // TESTRESPONSE[s/"127.0.0.1:9300"/$body.$_path/]

+ 81 - 36
docs/reference/ml/anomaly-detection/apis/get-datafeed.asciidoc

@@ -42,35 +42,71 @@ IMPORTANT: This API returns a maximum of 10,000 {dfeeds}.
 ==== {api-path-parms-title}
 
 `<feed_id>`::
-  (Optional, string) Identifier for the {dfeed}. It can be a {dfeed} identifier
-  or a wildcard expression. If you do not specify one of these options, the API
-  returns information about all {dfeeds}.
+(Optional, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id-wildcard]
+If you do not specify one of these options, the API returns information about
+all {dfeeds}.
 
 [[ml-get-datafeed-query-parms]]
 ==== {api-query-parms-title}
 
 `allow_no_datafeeds`::
-  (Optional, boolean) Specifies what to do when the request:
-+
---
-* Contains wildcard expressions and there are no {datafeeds} that match.
-* Contains the `_all` string or no identifiers and there are no matches.
-* Contains wildcard expressions and there are only partial matches. 
-
-The default value is `true`, which returns an empty `datafeeds` array when
-there are no matches and the subset of results when there are partial matches.
-If this parameter is `false`, the request returns a `404` status code when there
-are no matches or only partial matches.
---
+(Optional, boolean)
+include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-datafeeds]
 
 [[ml-get-datafeed-results]]
 ==== {api-response-body-title}
 
-The API returns the following information:
+The API returns an array of {dfeed} resources, which have the following
+properties:
 
-`datafeeds`::
-  (array) An array of {dfeed} objects.
-  For more information, see <<ml-datafeed-resource>>.
+`aggregations`::
+(object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=aggregations]
+
+`chunking_config`::
+(object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=chunking-config]
+
+`datafeed_id`::
+(string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
+
+`delayed_data_check_config`::
+(object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=delayed-data-check-config]
+
+`frequency`::
+(<<time-units,time units>>)
+include::{docdir}/ml/ml-shared.asciidoc[tag=frequency]
+
+`indices`::
+(array)
+include::{docdir}/ml/ml-shared.asciidoc[tag=indices]
+
+`job_id`::
+(string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-datafeed]
+
+`max_empty_searches`::
+(integer)
+include::{docdir}/ml/ml-shared.asciidoc[tag=max-empty-searches]
+
+`query`::
+(object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=query]
+
+`query_delay`::
+(<<time-units,time units>>)
+include::{docdir}/ml/ml-shared.asciidoc[tag=query-delay]
+
+`script_fields`::
+(object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=script-fields]
+
+`scroll_size`::
+(unsigned integer)
+include::{docdir}/ml/ml-shared.asciidoc[tag=scroll-size]
 
 [[ml-get-datafeed-response-codes]]
 ==== {api-response-codes-title}
@@ -83,39 +119,48 @@ The API returns the following information:
 ==== {api-examples-title}
 
 The following example gets configuration information for the
-`datafeed-total-requests` {dfeed}:
+`datafeed-high_sum_total_sales` {dfeed}:
 
 [source,console]
 --------------------------------------------------
-GET _ml/datafeeds/datafeed-total-requests
+GET _ml/datafeeds/datafeed-high_sum_total_sales
 --------------------------------------------------
-// TEST[skip:setup:server_metrics_datafeed]
+// TEST[skip:kibana sample data]
 
 The API returns the following results:
 
 [source,console-result]
 ----
 {
-  "count": 1,
-  "datafeeds": [
+  "count" : 1,
+  "datafeeds" : [
     {
-      "datafeed_id": "datafeed-total-requests",
-      "job_id": "total-requests",
-      "query_delay": "83474ms",
-      "indices": [
-        "server-metrics"
+      "datafeed_id" : "datafeed-high_sum_total_sales",
+      "job_id" : "high_sum_total_sales",
+      "query_delay" : "93169ms",
+      "indices" : [
+        "kibana_sample_data_ecommerce"
       ],
-      "query": {
-        "match_all": {
-          "boost": 1.0
+      "query" : {
+        "bool" : {
+          "filter" : [
+            {
+              "term" : {
+                "_index" : "kibana_sample_data_ecommerce"
+              }
+            }
+          ]
         }
       },
-      "scroll_size": 1000,
-      "chunking_config": {
-        "mode": "auto"
+      "scroll_size" : 1000,
+      "chunking_config" : {
+        "mode" : "auto"
+      },
+      "delayed_data_check_config" : {
+        "enabled" : true
       }
     }
   ]
 }
 ----
-// TESTRESPONSE[s/"query.boost": "1.0"/"query.boost": $body.query.boost/]
+// TESTRESPONSE[s/"query.boost": "93169ms"/"query.boost": $body.query.boost/]

+ 22 - 17
docs/reference/ml/anomaly-detection/apis/preview-datafeed.asciidoc

@@ -41,18 +41,17 @@ it to ensure it is returning the expected data.
 ==== {api-path-parms-title}
 
 `<datafeed_id>`::
-  (Required, string) Identifier for the {dfeed}.
+(Required, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
 
 [[ml-preview-datafeed-example]]
 ==== {api-examples-title}
 
-The following example obtains a preview of the `datafeed-farequote` {dfeed}:
-
 [source,console]
 --------------------------------------------------
-GET _ml/datafeeds/datafeed-farequote/_preview
+GET _ml/datafeeds/datafeed-high_sum_total_sales/_preview
 --------------------------------------------------
-// TEST[skip:setup:farequote_datafeed]
+// TEST[skip:set up Kibana sample data]
 
 The data that is returned for this example is as follows:
 
@@ -60,22 +59,28 @@ The data that is returned for this example is as follows:
 ----
 [
   {
-    "time": 1454803200000,
-    "airline": "JZA",
-    "doc_count": 5,
-    "responsetime": 990.4628295898438
+    "order_date" : 1574294659000,
+    "category.keyword" : "Men's Clothing",
+    "customer_full_name.keyword" : "Sultan Al Benson",
+    "taxful_total_price" : 35.96875
   },
   {
-    "time": 1454803200000,
-    "airline": "JBU",
-    "doc_count": 23,
-    "responsetime": 877.5927124023438
+    "order_date" : 1574294918000,
+    "category.keyword" : [
+      "Women's Accessories",
+      "Women's Clothing"
+    ],
+    "customer_full_name.keyword" : "Pia Webb",
+    "taxful_total_price" : 83.0
   },
   {
-    "time": 1454803200000,
-    "airline": "KLM",
-    "doc_count": 42,
-    "responsetime": 1355.481201171875
+    "order_date" : 1574295782000,
+    "category.keyword" : [
+      "Women's Accessories",
+      "Women's Shoes"
+    ],
+    "customer_full_name.keyword" : "Brigitte Graham",
+    "taxful_total_price" : 72.0
   }
 ]
 ----

+ 24 - 42
docs/reference/ml/anomaly-detection/apis/put-datafeed.asciidoc

@@ -43,70 +43,52 @@ those same roles.
 ==== {api-path-parms-title}
 
 `<feed_id>`::
-  (Required, string) A numerical character string that uniquely identifies the
-  {dfeed}. This identifier can contain lowercase alphanumeric characters (a-z
-  and 0-9), hyphens, and underscores. It must start and end with alphanumeric
-  characters.
+(Required, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
 
 [[ml-put-datafeed-request-body]]
 ==== {api-request-body-title}
 
 `aggregations`::
-  (Optional, object) If set, the {dfeed} performs aggregation searches. For more
-  information, see <<ml-datafeed-resource>>.
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=aggregations]
 
 `chunking_config`::
-  (Optional, object) Specifies how data searches are split into time chunks. See
-  <<ml-datafeed-chunking-config>>.
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=chunking-config]
 
 `delayed_data_check_config`::
-  (Optional, object) Specifies whether the data feed checks for missing data and 
-  the size of the window. See <<ml-datafeed-delayed-data-check-config>>.
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=delayed-data-check-config]
 
 `frequency`::
-  (Optional, <<time-units, time units>>) The interval at which scheduled queries 
-  are made while the {dfeed} runs in real time. The default value is either the 
-  bucket span for short bucket spans, or, for longer bucket spans, a sensible 
-  fraction of the bucket span. For example: `150s`.
+(Optional, <<time-units, time units>>)
+include::{docdir}/ml/ml-shared.asciidoc[tag=frequency]
 
 `indices`::
-  (Required, array) An array of index names. Wildcards are supported. For
-  example: `["it_ops_metrics", "server*"]`.
-+
---
-NOTE: If any indices are in remote clusters then `cluster.remote.connect` must
-not be set to `false` on any ML node.
---
+(Required, array)
+include::{docdir}/ml/ml-shared.asciidoc[tag=indices]
 
-`job_id`::
- (Required, string) A numerical character string that uniquely identifies the
- {anomaly-job}.
 
+`job_id`::
+(Required, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-anomaly-detection]
+ 
 `query`::
-  (Optional, object) The {es} query domain-specific language (DSL). This value
-  corresponds to the query object in an {es} search POST body. All the options
-  that are supported by {Es} can be used, as this object is passed verbatim to
-  {es}. By default, this property has the following value:
-  `{"match_all": {"boost": 1}}`.
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=query]
 
 `query_delay`::
-  (Optional, <<time-units, time units>>) The number of seconds behind real time 
-  that data is queried. For example, if data from 10:04 a.m. might not be 
-  searchable in {es} until 10:06 a.m., set this property to 120 seconds. The 
-  default value is `60s`.
+(Optional, <<time-units, time units>>)
+include::{docdir}/ml/ml-shared.asciidoc[tag=query-delay]
 
 `script_fields`::
-  (Optional, object) Specifies scripts that evaluate custom expressions and
-  returns script fields to the {dfeed}. The detector configuration objects in a
-  job can contain functions that use these script fields. For more information,
-  see <<request-body-search-script-fields,Script fields>>.
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=script-fields]
 
 `scroll_size`::
-  (Optional, unsigned integer) The `size` parameter that is used in {es}
-  searches. The default value is `1000`.
-
-For more information about these properties,
-see <<ml-datafeed-resource>>.
+(Optional, unsigned integer)
+include::{docdir}/ml/ml-shared.asciidoc[tag=scroll-size]
 
 [[ml-put-datafeed-example]]
 ==== {api-examples-title}

+ 2 - 1
docs/reference/ml/anomaly-detection/apis/start-datafeed.asciidoc

@@ -74,7 +74,8 @@ creation/update and runs the query using those same roles.
 ==== {api-path-parms-title}
 
 `<feed_id>`::
-  (Required, string) Identifier for the {dfeed}.
+(Required, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
 
 [[ml-start-datafeed-request-body]]
 ==== {api-request-body-title}

+ 4 - 14
docs/reference/ml/anomaly-detection/apis/stop-datafeed.asciidoc

@@ -40,25 +40,15 @@ comma-separated list of {dfeeds} or a wildcard expression. You can close all
 ==== {api-path-parms-title}
 
 `<feed_id>`::
-  (Required, string) Identifier for the {dfeed}. It can be a {dfeed} identifier
-  or a wildcard expression.
+(Required, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id-wildcard]
 
 [[ml-stop-datafeed-query-parms]]
 ==== {api-query-parms-title}
 
 `allow_no_datafeeds`::
-  (Optional, boolean) Specifies what to do when the request:
-+
---
-* Contains wildcard expressions and there are no {datafeeds} that match.
-* Contains the `_all` string or no identifiers and there are no matches.
-* Contains wildcard expressions and there are only partial matches. 
-
-The default value is `true`, which returns an empty `datafeeds` array when
-there are no matches and the subset of results when there are partial matches.
-If this parameter is `false`, the request returns a `404` status code when there
-are no matches or only partial matches.
---
+(Optional, boolean)
+include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-datafeeds]
 
 [[ml-stop-datafeed-request-body]]
 ==== {api-request-body-title}

+ 36 - 47
docs/reference/ml/anomaly-detection/apis/update-datafeed.asciidoc

@@ -39,7 +39,8 @@ using those same roles.
 ==== {api-path-parms-title}
 
 `<feed_id>`::
-  (Required, string) Identifier for the {dfeed}.
+(Required, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=datafeed-id]
 
 [[ml-update-datafeed-request-body]]
 ==== {api-request-body-title}
@@ -47,70 +48,58 @@ using those same roles.
 The following properties can be updated after the {dfeed} is created:
 
 `aggregations`::
-  (Optional, object) If set, the {dfeed} performs aggregation searches. For more
-  information, see <<ml-datafeed-resource>>.
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=aggregations]
 
 `chunking_config`::
-  (Optional, object) Specifies how data searches are split into time chunks. See
-  <<ml-datafeed-chunking-config>>.
-  
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=chunking-config]
+
 `delayed_data_check_config`::
-  (Optional, object) Specifies whether the data feed checks for missing data and 
-  the size of the window. See <<ml-datafeed-delayed-data-check-config>>.  
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=delayed-data-check-config]
 
 `frequency`::
-  (Optional, <<time-units, time units>>) The interval at which scheduled queries 
-  are made while the {dfeed} runs in real time. The default value is either the 
-  bucket span for short bucket spans, or, for longer bucket spans, a sensible 
-  fraction of the bucket span. For example: `150s`.
+(Optional, <<time-units, time units>>)
+include::{docdir}/ml/ml-shared.asciidoc[tag=frequency]
 
 `indices`::
-  (Optional, array) An array of index names. Wildcards are supported. For
-  example: `["it_ops_metrics", "server*"]`.
+(Optional, array)
+include::{docdir}/ml/ml-shared.asciidoc[tag=indices]
+
+`max_empty_searches`::
+(Optional, integer)
+include::{docdir}/ml/ml-shared.asciidoc[tag=max-empty-searches]
++
+--
+The special value `-1` unsets this setting.
+--
 
 `query`::
-  (Optional, object) The {es} query domain-specific language (DSL). This value
-  corresponds to the query object in an {es} search POST body. All the options
-  that are supported by {es} can be used, as this object is passed verbatim to
-  {es}. By default, this property has the following value:
-  `{"match_all": {"boost": 1}}`.
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=query]
 +
 --
-WARNING: If you change the query, then the analyzed data will also be changed, 
-therefore the required time to learn might be long and the understandability of 
-the results is unpredictable.
-If you want to make significant changes to the source data, we would recommend 
-you clone it and create a second job containing the amendments. Let both run in 
-parallel and close one when you are satisfied with the results of the other job.
+WARNING: If you change the query, the analyzed data is also changed. Therefore,
+the required time to learn might be long and the understandability of the
+results is unpredictable. If you want to make significant changes to the source
+data, we would recommend you clone it and create a second job containing the
+amendments. Let both run in parallel and close one when you are satisfied with
+the results of the other job.
+
 --
 
 `query_delay`::
-  (Optional, <<time-units, time units>>) The number of seconds behind real-time 
-  that data is queried. For example, if data from 10:04 a.m. might not be 
-  searchable in {es} until 10:06 a.m., set this property to 120 seconds. The 
-  default value is `60s`.
+(Optional, <<time-units, time units>>)
+include::{docdir}/ml/ml-shared.asciidoc[tag=query-delay]
 
 `script_fields`::
-  (Optional, object) Specifies scripts that evaluate custom expressions and
-  returns script fields to the {dfeed}. The detector configuration objects in a
-  job can contain functions that use these script fields. For more information,
-  see <<request-body-search-script-fields,Script fields>>.
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=script-fields]
 
 `scroll_size`::
-  (Optional, unsigned integer) The `size` parameter that is used in {es}
-  searches. The default value is `1000`.
-
-`max_empty_searches`::
-  (Optional, integer) If a real-time {dfeed} has never seen any data (including
-  during any initial training period) then it will automatically stop itself
-  and close its associated job after this many real-time searches that return
-  no documents. In other words, it will stop after `frequency` times
-  `max_empty_searches` of real-time operation. If not set
-  then a {dfeed} with no end time that sees no data will remain started until
-  it is explicitly stopped. The special value `-1` unsets this setting.
-
-For more information about these properties, see <<ml-datafeed-resource>>.
-
+(Optional, unsigned integer)
+include::{docdir}/ml/ml-shared.asciidoc[tag=scroll-size]
 
 [[ml-update-datafeed-example]]
 ==== {api-examples-title}

+ 10 - 10
docs/reference/ml/anomaly-detection/delayed-data-detection.asciidoc

@@ -5,14 +5,15 @@
 Delayed data are documents that are indexed late. That is to say, it is data
 related to a time that the {dfeed} has already processed.
 
-When you create a datafeed, you can specify a
-{ref}/ml-datafeed-resource.html[`query_delay`] setting. This setting enables the
-datafeed to wait for some time past real-time, which means any "late" data in
-this period is fully indexed before the datafeed tries to gather it. However, if
-the setting is set too low, the datafeed may query for data before it has been
-indexed and consequently miss that document. Conversely, if it is set too high,
-analysis drifts farther away from real-time. The balance that is struck depends
-upon each use case and the environmental factors of the cluster.
+When you create a {dfeed}, you can specify a
+{ref}/ml-put-datafeed.html#ml-put-datafeed-request-body[`query_delay`] setting.
+This setting enables the {dfeed} to wait for some time past real-time, which
+means any "late" data in this period is fully indexed before the {dfeed} tries
+to gather it. However, if the setting is set too low, the {dfeed} may query for
+data before it has been indexed and consequently miss that document. Conversely,
+if it is set too high, analysis drifts farther away from real-time. The balance
+that is struck depends upon each use case and the environmental factors of the
+cluster.
 
 ==== Why worry about delayed data?
 
@@ -28,8 +29,7 @@ recorded so that you can determine a next course of action.
 
 ==== How do we detect delayed data?
 
-In addition to the `query_delay` field, there is a
-{ref}/ml-datafeed-resource.html#ml-datafeed-delayed-data-check-config[delayed data check config],
+In addition to the `query_delay` field, there is a delayed data check config,
 which enables you to configure the datafeed to look in the past for delayed data.
 Every 15 minutes or every `check_window`, whichever is smaller, the datafeed
 triggers a document search over the configured indices. This search looks over a

+ 153 - 0
docs/reference/ml/ml-shared.asciidoc

@@ -1,3 +1,10 @@
+tag::aggregations[]
+If set, the {dfeed} performs aggregation searches. Support for aggregations is
+limited and should only be used with low cardinality data. For more information,
+see
+{stack-ov}/ml-configuring-aggregation.html[Aggregating data for faster performance].
+end::aggregations[]
+
 tag::allow-lazy-open[]
 Advanced configuration option. Specifies whether this job can open when there is 
 insufficient {ml} node capacity for it to be immediately assigned to a node. The
@@ -9,6 +16,21 @@ return an error and the job waits in the `opening` state until sufficient {ml}
 node capacity is available.
 end::allow-lazy-open[]
 
+tag::allow-no-datafeeds[]
+Specifies what to do when the request:
++
+--
+* Contains wildcard expressions and there are no {dfeeds} that match.
+* Contains the `_all` string or no identifiers and there are no matches.
+* Contains wildcard expressions and there are only partial matches. 
+
+The default value is `true`, which returns an empty `datafeeds` array when
+there are no matches and the subset of results when there are partial matches.
+If this parameter is `false`, the request returns a `404` status code when there
+are no matches or only partial matches.
+--
+end::allow-no-datafeeds[]
+
 tag::allow-no-jobs[]
 Specifies what to do when the request:
 +
@@ -207,6 +229,22 @@ add them here as
 <<analysis-pattern-replace-charfilter,pattern replace character filters>>.
 end::char-filter[]
 
+tag::chunking-config[]
+{dfeeds-cap} might be required to search over long time periods, for several months
+or years. This search is split into time chunks in order to ensure the load
+on {es} is managed. Chunking configuration controls how the size of these time
+chunks are calculated and is an advanced configuration option.
+A chunking configuration object has the following properties:
+
+`mode`:::
+(string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=mode]
+
+`time_span`:::
+(<<time-units,time units>>)
+include::{docdir}/ml/ml-shared.asciidoc[tag=time-span]
+end::chunking-config[]
+
 tag::custom-rules[]
 An array of custom rule objects, which enable you to customize the way detectors
 operate. For example, a rule may dictate to the detector conditions under which
@@ -301,6 +339,47 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=time-format]
 --
 end::data-description[]
 
+tag::datafeed-id[]
+A numerical character string that uniquely identifies the
+{dfeed}. This identifier can contain lowercase alphanumeric characters (a-z
+and 0-9), hyphens, and underscores. It must start and end with alphanumeric
+characters.
+end::datafeed-id[]
+
+tag::datafeed-id-wildcard[]
+Identifier for the {dfeed}. It can be a {dfeed} identifier or a wildcard
+expression.
+end::datafeed-id-wildcard[]
+
+tag::delayed-data-check-config[]
+Specifies whether the {dfeed} checks for missing data and the size of the
+window. For example: `{"enabled": true, "check_window": "1h"}`.
++
+--
+The {dfeed} can optionally search over indices that have already been read in
+an effort to determine whether any data has subsequently been added to the index.
+If missing data is found, it is a good indication that the `query_delay` option
+is set too low and the data is being indexed after the {dfeed} has passed that
+moment in time. See 
+{stack-ov}/ml-delayed-data-detection.html[Working with delayed data].
+
+This check runs only on real-time {dfeeds}.
+
+The configuration object has the following properties:
+
+`enabled`::
+(boolean) Specifies whether the {dfeed} periodically checks for delayed data.
+Defaults to `true`.
+
+`check_window`::
+(<<time-units,time units>>) The window of time that is searched for late data.
+This window of time ends with the latest finalized bucket. It defaults to
+`null`, which causes an appropriate `check_window` to be calculated when the
+real-time {dfeed} runs. In particular, the default `check_window` span
+calculation is based on the maximum of `2h` or `8 * bucket_span`.
+--
+end::delayed-data-check-config[]
+
 tag::dependent_variable[]
 `dependent_variable`::
 (Required, string) Defines which field of the document is to be predicted. 
@@ -405,6 +484,13 @@ optional. If it is not specified, no token filters are applied prior to
 categorization.
 end::filter[]
 
+tag::frequency[]
+The interval at which scheduled queries are made while the {dfeed} runs in real
+time. The default value is either the bucket span for short bucket spans, or,
+for longer bucket spans, a sensible fraction of the bucket span. For example:
+`150s`.
+end::frequency[]
+
 tag::function[]
 The analysis function that is used. For example, `count`, `rare`, `mean`, `min`, 
 `max`, and `sum`. For more information, see
@@ -424,6 +510,17 @@ tag::groups[]
 A list of job groups. A job can belong to no groups or many.
 end::groups[]
 
+tag::indices[]
+An array of index names. Wildcards are supported. For example:
+`["it_ops_metrics", "server*"]`.
++
+--
+NOTE: If any indices are in remote clusters then `cluster.remote.connect` must
+not be set to `false` on any {ml} nodes.
+
+--
+end::indices[]
+
 tag::influencers[]
 A comma separated list of influencer field names. Typically these can be the by, 
 over, or partition fields that are used in the detector configuration. You might 
@@ -475,6 +572,10 @@ alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start
 and end with alphanumeric characters.
 end::job-id-data-frame-analytics-define[]
 
+tag::job-id-datafeed[]
+The unique identifier for the job to which the {dfeed} sends data.
+end::job-id-datafeed[]
+
 tag::jobs-stats-anomaly-detection[]
 An array of {anomaly-job} statistics objects.
 For more information, see <<ml-jobstats>>.
@@ -502,12 +603,32 @@ the <<ml-post-data,post data>> API.
 --
 end::latency[]
 
+tag::max-empty-searches[]
+If a real-time {dfeed} has never seen any data (including during any initial
+training period) then it will automatically stop itself and close its associated
+job after this many real-time searches that return no documents. In other words,
+it will stop after `frequency` times `max_empty_searches` of real-time operation.
+If not set then a {dfeed} with no end time that sees no data will remain started
+until it is explicitly stopped. By default this setting is not set.
+end::max-empty-searches[]
+
 tag::maximum_number_trees[]
 `maximum_number_trees`::
 (Optional, integer) Defines the maximum number of trees the forest is allowed 
 to contain. The maximum value is 2000.
 end::maximum_number_trees[]
 
+tag::mode[]
+There are three available modes: 
++
+--
+* `auto`: The chunk size is dynamically calculated. This is the default and
+recommended value.
+* `manual`: Chunking is applied according to the specified `time_span`.
+* `off`: No chunking is applied.
+--
+end::mode[]
+
 tag::model-memory-limit[]
 The approximate maximum amount of memory resources that are required for 
 analytical processing. Once this limit is approached, data pruning becomes
@@ -615,6 +736,21 @@ tag::prediction_field_name[]
 Defaults to `<dependent_variable>_prediction`.
 end::prediction_field_name[]
 
+tag::query[]
+The {es} query domain-specific language (DSL). This value corresponds to the
+query object in an {es} search POST body. All the options that are supported by
+{es} can be used, as this object is passed verbatim to {es}. By default, this
+property has the following value: `{"match_all": {"boost": 1}}`.
+end::query[]
+
+tag::query-delay[]
+The number of seconds behind real time that data is queried. For example, if
+data from 10:04 a.m. might not be searchable in {es} until 10:06 a.m., set this
+property to 120 seconds. The default value is randomly selected between `60s`
+and `120s`. This randomness improves the query performance when there are
+multiple jobs running on the same node.
+end::query-delay[]
+
 tag::renormalization-window-days[]
 Advanced configuration option. The period over which adjustments to the score
 are applied, as new data is seen. The default value is the longer of 30 days or
@@ -633,6 +769,18 @@ are deleted from {es}. The default value is null, which means results are
 retained.
 end::results-retention-days[]
 
+tag::script-fields[]
+Specifies scripts that evaluate custom expressions and returns script fields to
+the {dfeed}. The detector configuration objects in a job can contain functions
+that use these script fields. For more information, see
+{stack-ov}/ml-configuring-transform.html[Transforming data with script fields]
+and <<request-body-search-script-fields,Script fields>>.
+end::script-fields[]
+
+tag::scroll-size[]
+The `size` parameter that is used in {es} searches. The default value is `1000`.
+end::scroll-size[]
+
 tag::summary-count-field-name[]
 If this property is specified, the data that is fed to the job is expected to be
 pre-summarized. This property value is the name of the field that contains the 
@@ -663,6 +811,11 @@ job creation fails.
 --
 end::time-format[]
 
+tag::time-span[]
+The time span that each search will be querying. This setting is only applicable
+when the mode is set to `manual`. For example: `3h`.
+end::time-span[]
+
 tag::tokenizer[]
 The name or definition of the <<analysis-tokenizers,tokenizer>> to use after 
 character filters are applied. This property is compulsory if 

+ 12 - 1
docs/reference/redirects.asciidoc

@@ -1054,4 +1054,15 @@ This page was deleted.
 [[ml-analysisconfig]]
 See the details in
 [[ml-apimodelplotconfig]]
-<<ml-put-job>>, <<ml-update-job>>, and <<ml-get-job>>.
+<<ml-put-job>>, <<ml-update-job>>, and <<ml-get-job>>.
+
+[role="exclude",id="ml-datafeed-resource"]
+=== {dfeed-cap} resources
+
+This page was deleted.
+[[ml-datafeed-chunking-config]]
+See the details in <<ml-put-datafeed>>, <<ml-update-datafeed>>, 
+[[ml-datafeed-delayed-data-check-config]]
+<<ml-get-datafeed>>, 
+[[ml-datafeed-counts]]
+<<ml-get-datafeed-stats>>.

+ 0 - 3
docs/reference/rest-api/defs.asciidoc

@@ -5,8 +5,6 @@
 These resource definitions are used in APIs related to {ml-features} and
 {security-features} and in {kib} advanced {ml} job configuration options.
 
-* <<ml-datafeed-resource,{dfeeds-cap}>>
-* <<ml-datafeed-counts,{dfeed-cap} counts>>
 * <<ml-dfanalytics-resources,{dfanalytics-cap}>>
 * <<ml-evaluate-dfanalytics-resources,Evaluate {dfanalytics}>>
 * <<ml-jobstats,{anomaly-jobs-cap} statistics>>
@@ -15,7 +13,6 @@ These resource definitions are used in APIs related to {ml-features} and
 * <<role-mapping-resources,Role mappings>>
 * <<transform-resource,{transforms-cap}>>
 
-include::{es-repo-dir}/ml/anomaly-detection/apis/datafeedresource.asciidoc[]
 include::{es-repo-dir}/ml/df-analytics/apis/dfanalyticsresources.asciidoc[]
 include::{es-repo-dir}/ml/df-analytics/apis/evaluateresources.asciidoc[]
 include::{es-repo-dir}/ml/anomaly-detection/apis/jobcounts.asciidoc[]