|
@@ -7,18 +7,18 @@ Entities or events in your data can be considered anomalous when:
|
|
|
* Their behavior changes over time, relative to their own previous behavior, or
|
|
|
* Their behavior is different than other entities in a specified population.
|
|
|
|
|
|
-The latter method of detecting outliers is known as _population analysis_. The
|
|
|
-{ml} analytics build a profile of what a "typical" user, machine, or other entity
|
|
|
-does over a specified time period and then identify when one is behaving
|
|
|
+The latter method of detecting anomalies is known as _population analysis_. The
|
|
|
+{ml} analytics build a profile of what a "typical" user, machine, or other
|
|
|
+entity does over a specified time period and then identify when one is behaving
|
|
|
abnormally compared to the population.
|
|
|
|
|
|
This type of analysis is most useful when the behavior of the population as a
|
|
|
-whole is mostly homogeneous and you want to identify outliers. In general,
|
|
|
-population analysis is not useful when members of the population inherently
|
|
|
-have vastly different behavior. You can, however, segment your data into groups
|
|
|
-that behave similarly and run these as separate jobs. For example, you can use a
|
|
|
-query filter in the {dfeed} to segment your data or you can use the
|
|
|
-`partition_field_name` to split the analysis for the different groups.
|
|
|
+whole is mostly homogeneous and you want to identify unusual behavior. In
|
|
|
+general, population analysis is not useful when members of the population
|
|
|
+inherently have vastly different behavior. You can, however, segment your data
|
|
|
+into groups that behave similarly and run these as separate jobs. For example,
|
|
|
+you can use a query filter in the {dfeed} to segment your data or you can use
|
|
|
+the `partition_field_name` to split the analysis for the different groups.
|
|
|
|
|
|
Population analysis scales well and has a lower resource footprint than
|
|
|
individual analysis of each series. For example, you can analyze populations
|
|
@@ -52,8 +52,8 @@ PUT _ml/anomaly_detectors/population
|
|
|
----------------------------------
|
|
|
// TEST[skip:needs-licence]
|
|
|
|
|
|
-<1> This `over_field_name` property indicates that the metrics for each client (
|
|
|
- as identified by their IP address) are analyzed relative to other clients
|
|
|
+<1> This `over_field_name` property indicates that the metrics for each client
|
|
|
+ (as identified by their IP address) are analyzed relative to other clients
|
|
|
in each bucket.
|
|
|
|
|
|
If your data is stored in {es}, you can use the population job wizard in {kib}
|
|
@@ -73,8 +73,8 @@ image::images/ml-population-results.png["Population analysis results in the Anom
|
|
|
|
|
|
As in this case, the results are often quite sparse. There might be just a few
|
|
|
data points for the selected time period. Population analysis is particularly
|
|
|
-useful when you have many entities and the data for specific entitles is sporadic
|
|
|
-or sparse.
|
|
|
+useful when you have many entities and the data for specific entitles is
|
|
|
+sporadic or sparse.
|
|
|
|
|
|
If you click on a section in the timeline or swim lanes, you can see more
|
|
|
details about the anomalies:
|