datafeedresource.asciidoc 5.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[ml-datafeed-resource]]
  4. === {dfeed-cap} Resources
  5. A {dfeed} resource has the following properties:
  6. `aggregations`::
  7. (object) If set, the {dfeed} performs aggregation searches.
  8. Support for aggregations is limited and should only be used with
  9. low cardinality data. For more information, see
  10. {xpack-ref}/ml-configuring-aggregation.html[Aggregating Data for Faster Performance].
  11. `chunking_config`::
  12. (object) Specifies how data searches are split into time chunks.
  13. See <<ml-datafeed-chunking-config>>.
  14. For example: `{"mode": "manual", "time_span": "3h"}`
  15. `datafeed_id`::
  16. (string) A numerical character string that uniquely identifies the {dfeed}.
  17. This property is informational; you cannot change the identifier for existing
  18. {dfeeds}.
  19. `frequency`::
  20. (time units) The interval at which scheduled queries are made while the
  21. {dfeed} runs in real time. The default value is either the bucket span for short
  22. bucket spans, or, for longer bucket spans, a sensible fraction of the bucket
  23. span. For example: `150s`.
  24. `indices`::
  25. (array) An array of index names. For example: `["it_ops_metrics"]`
  26. `job_id`::
  27. (string) The unique identifier for the job to which the {dfeed} sends data.
  28. `query`::
  29. (object) The {es} query domain-specific language (DSL). This value
  30. corresponds to the query object in an {es} search POST body. All the
  31. options that are supported by {es} can be used, as this object is
  32. passed verbatim to {es}. By default, this property has the following
  33. value: `{"match_all": {"boost": 1}}`.
  34. `query_delay`::
  35. (time units) The number of seconds behind real time that data is queried. For
  36. example, if data from 10:04 a.m. might not be searchable in {es} until
  37. 10:06 a.m., set this property to 120 seconds. The default value is randomly
  38. selected between `60s` and `120s`. This randomness improves the query
  39. performance when there are multiple jobs running on the same node.
  40. `script_fields`::
  41. (object) Specifies scripts that evaluate custom expressions and returns
  42. script fields to the {dfeed}.
  43. The <<ml-detectorconfig,detector configuration objects>> in a job can contain
  44. functions that use these script fields.
  45. For more information, see
  46. {xpack-ref}/ml-configuring-transform.html[Transforming Data With Script Fields].
  47. `scroll_size`::
  48. (unsigned integer) The `size` parameter that is used in {es} searches.
  49. The default value is `1000`.
  50. `types`::
  51. (array) A list of types to search for within the specified indices. For
  52. example: `[]`. This property is provided for backwards compatibility with
  53. releases earlier than 6.0.0. For more information, see <<removal-of-types>>.
  54. `delayed_data_check_config`::
  55. (object) Specifies whether the data feed checks for missing data and
  56. and the size of the window. For example:
  57. `{"enabled": true, "check_window": "1h"}` See
  58. <<ml-datafeed-delayed-data-check-config>>.
  59. [[ml-datafeed-chunking-config]]
  60. ==== Chunking Configuration Objects
  61. {dfeeds-cap} might be required to search over long time periods, for several months
  62. or years. This search is split into time chunks in order to ensure the load
  63. on {es} is managed. Chunking configuration controls how the size of these time
  64. chunks are calculated and is an advanced configuration option.
  65. A chunking configuration object has the following properties:
  66. `mode`::
  67. There are three available modes: +
  68. `auto`::: The chunk size will be dynamically calculated. This is the default
  69. and recommended value.
  70. `manual`::: Chunking will be applied according to the specified `time_span`.
  71. `off`::: No chunking will be applied.
  72. `time_span`::
  73. (time units) The time span that each search will be querying.
  74. This setting is only applicable when the mode is set to `manual`.
  75. For example: `3h`.
  76. [[ml-datafeed-delayed-data-check-config]]
  77. ==== Delayed Data Check Configuration Objects
  78. The {dfeed} can optionally search over indices that have already been read in
  79. an effort to find if any data has since been added to the index. If missing data
  80. is found, it is a good indication that the `query_delay` option is set too low and
  81. the data is being indexed after the {dfeed} has passed that moment in time. See
  82. {stack-ov}/ml-delayed-data-detection.html[Working with delayed data].
  83. This check only runs on real-time {dfeeds}
  84. The configuration object has the following properties:
  85. `enabled`::
  86. (boolean) Should the {dfeed} periodically check for data being indexed after reading.
  87. Defaults to `true`
  88. `check_window`::
  89. (time units) The window of time before the latest finalized bucket that should be searched
  90. for late data. Defaults to `null` which causes an appropriate `check_window` to be calculated
  91. when the real-time {dfeed} runs.
  92. The default `check_window` span calculation is the max between `2h` or `8 * bucket_span`.
  93. [float]
  94. [[ml-datafeed-counts]]
  95. ==== {dfeed-cap} Counts
  96. The get {dfeed} statistics API provides information about the operational
  97. progress of a {dfeed}. All of these properties are informational; you cannot
  98. update their values:
  99. `assignment_explanation`::
  100. (string) For started {dfeeds} only, contains messages relating to the
  101. selection of a node.
  102. `datafeed_id`::
  103. (string) A numerical character string that uniquely identifies the {dfeed}.
  104. `node`::
  105. (object) The node upon which the {dfeed} is started. The {dfeed} and job will
  106. be on the same node.
  107. `id`::: The unique identifier of the node. For example,
  108. "0-o0tOoRTwKFZifatTWKNw".
  109. `name`::: The node name. For example, `0-o0tOo`.
  110. `ephemeral_id`::: The node ephemeral ID.
  111. `transport_address`::: The host and port where transport HTTP connections are
  112. accepted. For example, `127.0.0.1:9300`.
  113. `attributes`::: For example, `{"ml.max_open_jobs": "10"}`.
  114. `state`::
  115. (string) The status of the {dfeed}, which can be one of the following values: +
  116. `started`::: The {dfeed} is actively receiving data.
  117. `stopped`::: The {dfeed} is stopped and will not receive data until it is
  118. re-started.