|
@@ -18,15 +18,15 @@ cluster.
|
|
|
|
|
|
== Why worry about delayed data?
|
|
|
|
|
|
-This is a particularly prescient question. If data are delayed randomly (and
|
|
|
-consequently are missing from analysis), the results of certain types of
|
|
|
-functions are not really affected. In these situations, it all comes out okay in
|
|
|
-the end as the delayed data is distributed randomly. An example would be a `mean`
|
|
|
-metric for a field in a large collection of data. In this case, checking for
|
|
|
-delayed data may not provide much benefit. If data are consistently delayed,
|
|
|
-however, {anomaly-jobs} with a `low_count` function may provide false positives.
|
|
|
-In this situation, it would be useful to see if data comes in after an anomaly is
|
|
|
-recorded so that you can determine a next course of action.
|
|
|
+If data are delayed randomly (and consequently are missing from analysis), the
|
|
|
+results of certain types of functions are not really affected. In these
|
|
|
+situations, it all comes out okay in the end as the delayed data is distributed
|
|
|
+randomly. An example would be a `mean` metric for a field in a large collection
|
|
|
+of data. In this case, checking for delayed data may not provide much benefit.
|
|
|
+If data are consistently delayed, however, {anomaly-jobs} with a `low_count`
|
|
|
+function may provide false positives. In this situation, it would be useful to
|
|
|
+see if data comes in after an anomaly is recorded so that you can determine a
|
|
|
+next course of action.
|
|
|
|
|
|
== How do we detect delayed data?
|
|
|
|
|
@@ -40,7 +40,16 @@ of the associated {anomaly-job}. The `doc_count` of those buckets are then
|
|
|
compared with the job's finalized analysis buckets to see whether any data has
|
|
|
arrived since the analysis. If there is indeed missing data due to their ingest
|
|
|
delay, the end user is notified. For example, you can see annotations in {kib}
|
|
|
-for the periods where these delays occur.
|
|
|
+for the periods where these delays occur:
|
|
|
+
|
|
|
+[role="screenshot"]
|
|
|
+image::images/ml-annotations.png["Delayed data annotations in the Single Metric Viewer"]
|
|
|
+
|
|
|
+There is another tool for visualizing the delayed data on the *Annotations* tab
|
|
|
+in the {anomaly-detect} job management page:
|
|
|
+
|
|
|
+[role="screenshot"]
|
|
|
+image::images/ml-datafeed-chart.png["Delayed data in the {dfeed} chart"]
|
|
|
|
|
|
== What to do about delayed data?
|
|
|
|
|
@@ -50,4 +59,4 @@ delayed data is too great or the situation calls for it, the next course of
|
|
|
action to consider is to increase the `query_delay` of the datafeed. This
|
|
|
increased delay allows more time for data to be indexed. If you have real-time
|
|
|
constraints, however, an increased delay might not be desirable. In which case,
|
|
|
-you would have to {ref}/tune-for-indexing-speed.html[tune for better indexing speed].
|
|
|
+you would have to {ref}/tune-for-indexing-speed.html[tune for better indexing speed].
|