123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146 |
- [role="xpack"]
- [testenv="platinum"]
- [[ml-datafeed-resource]]
- === {dfeed-cap} Resources
- A {dfeed} resource has the following properties:
- `aggregations`::
- (object) If set, the {dfeed} performs aggregation searches.
- Support for aggregations is limited and should only be used with
- low cardinality data. For more information, see
- {xpack-ref}/ml-configuring-aggregation.html[Aggregating Data for Faster Performance].
- `chunking_config`::
- (object) Specifies how data searches are split into time chunks.
- See <<ml-datafeed-chunking-config>>.
- For example: `{"mode": "manual", "time_span": "3h"}`
- `datafeed_id`::
- (string) A numerical character string that uniquely identifies the {dfeed}.
- This property is informational; you cannot change the identifier for existing
- {dfeeds}.
- `frequency`::
- (time units) The interval at which scheduled queries are made while the
- {dfeed} runs in real time. The default value is either the bucket span for short
- bucket spans, or, for longer bucket spans, a sensible fraction of the bucket
- span. For example: `150s`.
- `indices`::
- (array) An array of index names. For example: `["it_ops_metrics"]`
- `job_id`::
- (string) The unique identifier for the job to which the {dfeed} sends data.
- `query`::
- (object) The {es} query domain-specific language (DSL). This value
- corresponds to the query object in an {es} search POST body. All the
- options that are supported by {es} can be used, as this object is
- passed verbatim to {es}. By default, this property has the following
- value: `{"match_all": {"boost": 1}}`.
- `query_delay`::
- (time units) The number of seconds behind real time that data is queried. For
- example, if data from 10:04 a.m. might not be searchable in {es} until
- 10:06 a.m., set this property to 120 seconds. The default value is randomly
- selected between `60s` and `120s`. This randomness improves the query
- performance when there are multiple jobs running on the same node.
- `script_fields`::
- (object) Specifies scripts that evaluate custom expressions and returns
- script fields to the {dfeed}.
- The <<ml-detectorconfig,detector configuration objects>> in a job can contain
- functions that use these script fields.
- For more information, see
- {xpack-ref}/ml-configuring-transform.html[Transforming Data With Script Fields].
- `scroll_size`::
- (unsigned integer) The `size` parameter that is used in {es} searches.
- The default value is `1000`.
- `types`::
- (array) A list of types to search for within the specified indices. For
- example: `[]`. This property is provided for backwards compatibility with
- releases earlier than 6.0.0. For more information, see <<removal-of-types>>.
- `delayed_data_check_config`::
- (object) Specifies if and with how large a window should the data feed check
- for missing data. See <<ml-datafeed-delayed-data-check-config>>.
- For example: `{"enabled": true, "check_window": "1h"}`
- [[ml-datafeed-chunking-config]]
- ==== Chunking Configuration Objects
- {dfeeds-cap} might be required to search over long time periods, for several months
- or years. This search is split into time chunks in order to ensure the load
- on {es} is managed. Chunking configuration controls how the size of these time
- chunks are calculated and is an advanced configuration option.
- A chunking configuration object has the following properties:
- `mode`::
- There are three available modes: +
- `auto`::: The chunk size will be dynamically calculated. This is the default
- and recommended value.
- `manual`::: Chunking will be applied according to the specified `time_span`.
- `off`::: No chunking will be applied.
- `time_span`::
- (time units) The time span that each search will be querying.
- This setting is only applicable when the mode is set to `manual`.
- For example: `3h`.
- [[ml-datafeed-delayed-data-check-config]]
- ==== Delayed Data Check Configuration Objects
- The {dfeed} can optionally search over indices that have already been read in
- an effort to find if any data has since been added to the index. If missing data
- is found, it is a good indication that the `query_delay` option is set too low and
- the data is being indexed after the {dfeed} has passed that moment in time.
- This check only runs on real-time {dfeeds}
- The configuration object has the following properties:
- `enabled`::
- (boolean) Should the {dfeed} periodically check for data being indexed after reading.
- Defaults to `true`
- `check_window`::
- (time units) The window of time before the latest finalized bucket that should be searched
- for late data. Defaults to `null` which causes an appropriate `check_window` to be calculated
- when the real-time {dfeed} runs.
- The default `check_window` span calculation is the max between `2h` or `8 * bucket_span`.
- [float]
- [[ml-datafeed-counts]]
- ==== {dfeed-cap} Counts
- The get {dfeed} statistics API provides information about the operational
- progress of a {dfeed}. All of these properties are informational; you cannot
- update their values:
- `assignment_explanation`::
- (string) For started {dfeeds} only, contains messages relating to the
- selection of a node.
- `datafeed_id`::
- (string) A numerical character string that uniquely identifies the {dfeed}.
- `node`::
- (object) The node upon which the {dfeed} is started. The {dfeed} and job will
- be on the same node.
- `id`::: The unique identifier of the node. For example,
- "0-o0tOoRTwKFZifatTWKNw".
- `name`::: The node name. For example, `0-o0tOo`.
- `ephemeral_id`::: The node ephemeral ID.
- `transport_address`::: The host and port where transport HTTP connections are
- accepted. For example, `127.0.0.1:9300`.
- `attributes`::: For example, `{"ml.max_open_jobs": "10"}`.
- `state`::
- (string) The status of the {dfeed}, which can be one of the following values: +
- `started`::: The {dfeed} is actively receiving data.
- `stopped`::: The {dfeed} is stopped and will not receive data until it is
- re-started.
|