1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586 |
- [role="xpack"]
- [[ml-configuring-pop]]
- === Performing population analysis
- Entities or events in your data can be considered anomalous when:
- * Their behavior changes over time, relative to their own previous behavior, or
- * Their behavior is different than other entities in a specified population.
- The latter method of detecting outliers is known as _population analysis_. The
- {ml} analytics build a profile of what a "typical" user, machine, or other entity
- does over a specified time period and then identify when one is behaving
- abnormally compared to the population.
- This type of analysis is most useful when the behavior of the population as a
- whole is mostly homogeneous and you want to identify outliers. In general,
- population analysis is not useful when members of the population inherently
- have vastly different behavior. You can, however, segment your data into groups
- that behave similarly and run these as separate jobs. For example, you can use a
- query filter in the {dfeed} to segment your data or you can use the
- `partition_field_name` to split the analysis for the different groups.
- Population analysis scales well and has a lower resource footprint than
- individual analysis of each series. For example, you can analyze populations
- of hundreds of thousands or millions of entities.
- To specify the population, use the `over_field_name` property. For example:
- [source,js]
- ----------------------------------
- PUT _xpack/ml/anomaly_detectors/population
- {
- "description" : "Population analysis",
- "analysis_config" : {
- "bucket_span":"10m",
- "influencers": [
- "username"
- ],
- "detectors": [
- {
- "function": "mean",
- "field_name": "bytesSent",
- "over_field_name": "username" <1>
- }
- ]
- },
- "data_description" : {
- "time_field":"@timestamp",
- "time_format": "epoch_ms"
- }
- }
- ----------------------------------
- //CONSOLE
- // TEST[skip:needs-licence]
- <1> This `over_field_name` property indicates that the metrics for each user (
- as identified by their `username` value) are analyzed relative to other users
- in each bucket.
- If your data is stored in {es}, you can use the population job wizard in {kib}
- to create a job with these same properties. For example, the population job
- wizard provides the following job settings:
- [role="screenshot"]
- image::images/ml-population-job.jpg["Job settings in the population job wizard]
- After you open the job and start the {dfeed} or supply data to the job, you can
- view the results in {kib}. For example, you can view the results in the
- **Anomaly Explorer**:
- [role="screenshot"]
- image::images/ml-population-results.jpg["Population analysis results in the Anomaly Explorer"]
- As in this case, the results are often quite sparse. There might be just a few
- data points for the selected time period. Population analysis is particularly
- useful when you have many entities and the data for specific entitles is sporadic
- or sparse.
- If you click on a section in the timeline or swimlanes, you can see more
- details about the anomalies:
- [role="screenshot"]
- image::images/ml-population-anomaly.jpg["Anomaly details for a specific user"]
- In this example, the user identified as `antonette` sent a high volume of bytes
- on the date and time shown. This event is anomalous because the mean is two times
- higher than the expected behavior of the population.
|