| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143 | [role="xpack"][testenv="platinum"][[ml-datafeed-resource]]=== {dfeed-cap} resourcesA {dfeed} resource has the following properties:`aggregations`::  (object) If set, the {dfeed} performs aggregation searches.  Support for aggregations is limited and should only be used with  low cardinality data. For more information, see  {xpack-ref}/ml-configuring-aggregation.html[Aggregating Data for Faster Performance].`chunking_config`::  (object) Specifies how data searches are split into time chunks.  See <<ml-datafeed-chunking-config>>.  For example: `{"mode": "manual", "time_span": "3h"}``datafeed_id`:: (string) A numerical character string that uniquely identifies the {dfeed}. This property is informational; you cannot change the identifier for existing {dfeeds}.`frequency`::  (time units) The interval at which scheduled queries are made while the  {dfeed} runs in real time. The default value is either the bucket span for short  bucket spans, or, for longer bucket spans, a sensible fraction of the bucket  span. For example: `150s`.`indices`::  (array) An array of index names. For example: `["it_ops_metrics"]``job_id`:: (string) The unique identifier for the job to which the {dfeed} sends data.`query`::  (object) The {es} query domain-specific language (DSL). This value  corresponds to the query object in an {es} search POST body. All the  options that are supported by {es} can be used, as this object is  passed verbatim to {es}. By default, this property has the following  value: `{"match_all": {"boost": 1}}`.`query_delay`::  (time units) The number of seconds behind real time that data is queried. For  example, if data from 10:04 a.m. might not be searchable in {es} until  10:06 a.m., set this property to 120 seconds. The default value is randomly  selected between `60s` and `120s`. This randomness improves the query  performance when there are multiple jobs running on the same node.`script_fields`::  (object) Specifies scripts that evaluate custom expressions and returns  script fields to the {dfeed}.  The <<ml-detectorconfig,detector configuration objects>> in a job can contain  functions that use these script fields.  For more information, see  {xpack-ref}/ml-configuring-transform.html[Transforming Data With Script Fields].`scroll_size`::  (unsigned integer) The `size` parameter that is used in {es} searches.  The default value is `1000`.`delayed_data_check_config`::  (object) Specifies whether the data feed checks for missing data and   and the size of the window. For example:  `{"enabled": true, "check_window": "1h"}` See  <<ml-datafeed-delayed-data-check-config>>.[[ml-datafeed-chunking-config]]==== Chunking Configuration Objects{dfeeds-cap} might be required to search over long time periods, for several monthsor years. This search is split into time chunks in order to ensure the loadon {es} is managed. Chunking configuration controls how the size of these timechunks are calculated and is an advanced configuration option.A chunking configuration object has the following properties:`mode`::  There are three available modes: +  `auto`::: The chunk size will be dynamically calculated. This is the default  and recommended value.  `manual`::: Chunking will be applied according to the specified `time_span`.  `off`::: No chunking will be applied.`time_span`::  (time units) The time span that each search will be querying.  This setting is only applicable when the mode is set to `manual`.  For example: `3h`.[[ml-datafeed-delayed-data-check-config]]==== Delayed Data Check Configuration ObjectsThe {dfeed} can optionally search over indices that have already been read inan effort to find if any data has since been added to the index. If missing datais found, it is a good indication that the `query_delay` option is set too low andthe data is being indexed after the {dfeed} has passed that moment in time. See {stack-ov}/ml-delayed-data-detection.html[Working with delayed data].This check only runs on real-time {dfeeds}The configuration object has the following properties:`enabled`::  (boolean) Should the {dfeed} periodically check for data being indexed after reading.  Defaults to `true``check_window`::  (time units) The window of time before the latest finalized bucket that should be searched  for late data. Defaults to `null` which causes an appropriate `check_window` to be calculated  when the real-time {dfeed} runs.  The default `check_window` span calculation is the max between `2h` or `8 * bucket_span`.[float][[ml-datafeed-counts]]==== {dfeed-cap} CountsThe get {dfeed} statistics API provides information about the operationalprogress of a {dfeed}. All of these properties are informational; you cannotupdate their values:`assignment_explanation`::  (string) For started {dfeeds} only, contains messages relating to the  selection of a node.`datafeed_id`:: (string) A numerical character string that uniquely identifies the {dfeed}.`node`::  (object) The node upon which the {dfeed} is started. The {dfeed} and job will  be on the same node.  `id`::: The unique identifier of the node. For example,  "0-o0tOoRTwKFZifatTWKNw".  `name`::: The node name. For example, `0-o0tOo`.  `ephemeral_id`::: The node ephemeral ID.  `transport_address`::: The host and port where transport HTTP connections are  accepted. For example, `127.0.0.1:9300`.  `attributes`::: For example, `{"ml.machine_memory": "17179869184"}`.`state`::  (string) The status of the {dfeed}, which can be one of the following values: +  `started`::: The {dfeed} is actively receiving data.  `stopped`::: The {dfeed} is stopped and will not receive data until it is  re-started.
 |