123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152 |
- [role="xpack"]
- [testenv="platinum"]
- [[ml-datafeed-resource]]
- === {dfeed-cap} resources
- A {dfeed} resource has the following properties:
- `aggregations`::
- (object) If set, the {dfeed} performs aggregation searches.
- Support for aggregations is limited and should only be used with
- low cardinality data. For more information, see
- {stack-ov}/ml-configuring-aggregation.html[Aggregating Data for Faster Performance].
- `chunking_config`::
- (object) Specifies how data searches are split into time chunks.
- See <<ml-datafeed-chunking-config>>.
- For example: `{"mode": "manual", "time_span": "3h"}`
- `datafeed_id`::
- (string) A numerical character string that uniquely identifies the {dfeed}.
- This property is informational; you cannot change the identifier for existing
- {dfeeds}.
- `frequency`::
- (time units) The interval at which scheduled queries are made while the
- {dfeed} runs in real time. The default value is either the bucket span for short
- bucket spans, or, for longer bucket spans, a sensible fraction of the bucket
- span. For example: `150s`.
- `indices`::
- (array) An array of index names. For example: `["it_ops_metrics"]`
- `job_id`::
- (string) The unique identifier for the job to which the {dfeed} sends data.
- `query`::
- (object) The {es} query domain-specific language (DSL). This value
- corresponds to the query object in an {es} search POST body. All the
- options that are supported by {es} can be used, as this object is
- passed verbatim to {es}. By default, this property has the following
- value: `{"match_all": {"boost": 1}}`.
- `query_delay`::
- (time units) The number of seconds behind real time that data is queried. For
- example, if data from 10:04 a.m. might not be searchable in {es} until
- 10:06 a.m., set this property to 120 seconds. The default value is randomly
- selected between `60s` and `120s`. This randomness improves the query
- performance when there are multiple jobs running on the same node.
- `script_fields`::
- (object) Specifies scripts that evaluate custom expressions and returns
- script fields to the {dfeed}.
- The <<ml-detectorconfig,detector configuration objects>> in a job can contain
- functions that use these script fields.
- For more information, see
- {stack-ov}/ml-configuring-transform.html[Transforming Data With Script Fields].
- `scroll_size`::
- (unsigned integer) The `size` parameter that is used in {es} searches.
- The default value is `1000`.
- `delayed_data_check_config`::
- (object) Specifies whether the data feed checks for missing data and
- the size of the window. For example:
- `{"enabled": true, "check_window": "1h"}` See
- <<ml-datafeed-delayed-data-check-config>>.
- [[ml-datafeed-chunking-config]]
- ==== Chunking configuration objects
- {dfeeds-cap} might be required to search over long time periods, for several months
- or years. This search is split into time chunks in order to ensure the load
- on {es} is managed. Chunking configuration controls how the size of these time
- chunks are calculated and is an advanced configuration option.
- A chunking configuration object has the following properties:
- `mode`::
- There are three available modes: +
- `auto`::: The chunk size will be dynamically calculated. This is the default
- and recommended value.
- `manual`::: Chunking will be applied according to the specified `time_span`.
- `off`::: No chunking will be applied.
- `time_span`::
- (time units) The time span that each search will be querying.
- This setting is only applicable when the mode is set to `manual`.
- For example: `3h`.
- [[ml-datafeed-delayed-data-check-config]]
- ==== Delayed data check configuration objects
- The {dfeed} can optionally search over indices that have already been read in
- an effort to determine whether any data has subsequently been added to the index.
- If missing data is found, it is a good indication that the `query_delay` option
- is set too low and the data is being indexed after the {dfeed} has passed that
- moment in time. See
- {stack-ov}/ml-delayed-data-detection.html[Working with delayed data].
- This check runs only on real-time {dfeeds}.
- The configuration object has the following properties:
- `enabled`::
- (boolean) Specifies whether the {dfeed} periodically checks for delayed data.
- Defaults to `true`.
- `check_window`::
- (time units) The window of time that is searched for late data. This window of
- time ends with the latest finalized bucket. It defaults to `null`, which
- causes an appropriate `check_window` to be calculated when the real-time
- {dfeed} runs. In particular, the default `check_window` span calculation is
- based on the maximum of `2h` or `8 * bucket_span`.
- [float]
- [[ml-datafeed-counts]]
- ==== {dfeed-cap} counts
- The get {dfeed} statistics API provides information about the operational
- progress of a {dfeed}. All of these properties are informational; you cannot
- update their values:
- `assignment_explanation`::
- (string) For started {dfeeds} only, contains messages relating to the
- selection of a node.
- `datafeed_id`::
- (string) A numerical character string that uniquely identifies the {dfeed}.
- `node`::
- (object) The node upon which the {dfeed} is started. The {dfeed} and job will
- be on the same node.
- `id`::: The unique identifier of the node. For example,
- "0-o0tOoRTwKFZifatTWKNw".
- `name`::: The node name. For example, `0-o0tOo`.
- `ephemeral_id`::: The node ephemeral ID.
- `transport_address`::: The host and port where transport HTTP connections are
- accepted. For example, `127.0.0.1:9300`.
- `attributes`::: For example, `{"ml.machine_memory": "17179869184"}`.
- `state`::
- (string) The status of the {dfeed}, which can be one of the following values: +
- `started`::: The {dfeed} is actively receiving data.
- `stopped`::: The {dfeed} is stopped and will not receive data until it is
- re-started.
- `timing_stats`::
- (object) An object that provides statistical information about timing aspect of this datafeed. +
- `job_id`::: A numerical character string that uniquely identifies the job.
- `search_count`::: Number of searches performed by this datafeed.
- `total_search_time_ms`::: Total time the datafeed spent searching in milliseconds.
|