123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286 |
- [role="xpack"]
- [[ml-count-functions]]
- = Count functions
- Count functions detect anomalies when the number of events in a bucket is
- anomalous.
- Use `non_zero_count` functions if your data is sparse and you want to ignore
- cases where the bucket count is zero.
- Use `distinct_count` functions to determine when the number of distinct values
- in one field is unusual, as opposed to the total count.
- Use high-sided functions if you want to monitor unusually high event rates.
- Use low-sided functions if you want to look at drops in event rate.
- The {ml-features} include the following count functions:
- * xref:ml-count[`count`, `high_count`, `low_count`]
- * xref:ml-nonzero-count[`non_zero_count`, `high_non_zero_count`, `low_non_zero_count`]
- * xref:ml-distinct-count[`distinct_count`, `high_distinct_count`, `low_distinct_count`]
- [float]
- [[ml-count]]
- == Count, high_count, low_count
- The `count` function detects anomalies when the number of events in a bucket is
- anomalous.
- The `high_count` function detects anomalies when the count of events in a
- bucket are unusually high.
- The `low_count` function detects anomalies when the count of events in a
- bucket are unusually low.
- These functions support the following properties:
- * `by_field_name` (optional)
- * `over_field_name` (optional)
- * `partition_field_name` (optional)
- For more information about those properties, see the
- {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
- .Example 1: Analyzing events with the count function
- [source,console]
- --------------------------------------------------
- PUT _ml/anomaly_detectors/example1
- {
- "analysis_config": {
- "detectors": [{
- "function" : "count"
- }]
- },
- "data_description": {
- "time_field":"timestamp",
- "time_format": "epoch_ms"
- }
- }
- --------------------------------------------------
- // TEST[skip:needs-licence]
- This example is probably the simplest possible analysis. It identifies
- time buckets during which the overall count of events is higher or lower than
- usual.
- When you use this function in a detector in your {anomaly-job}, it models the
- event rate and detects when the event rate is unusual compared to its past
- behavior.
- .Example 2: Analyzing errors with the high_count function
- [source,console]
- --------------------------------------------------
- PUT _ml/anomaly_detectors/example2
- {
- "analysis_config": {
- "detectors": [{
- "function" : "high_count",
- "by_field_name" : "error_code",
- "over_field_name": "user"
- }]
- },
- "data_description": {
- "time_field":"timestamp",
- "time_format": "epoch_ms"
- }
- }
- --------------------------------------------------
- // TEST[skip:needs-licence]
- If you use this `high_count` function in a detector in your {anomaly-job}, it
- models the event rate for each error code. It detects users that generate an
- unusually high count of error codes compared to other users.
- .Example 3: Analyzing status codes with the low_count function
- [source,console]
- --------------------------------------------------
- PUT _ml/anomaly_detectors/example3
- {
- "analysis_config": {
- "detectors": [{
- "function" : "low_count",
- "by_field_name" : "status_code"
- }]
- },
- "data_description": {
- "time_field":"timestamp",
- "time_format": "epoch_ms"
- }
- }
- --------------------------------------------------
- // TEST[skip:needs-licence]
- In this example, the function detects when the count of events for a
- status code is lower than usual.
- When you use this function in a detector in your {anomaly-job}, it models the
- event rate for each status code and detects when a status code has an unusually
- low count compared to its past behavior.
- .Example 4: Analyzing aggregated data with the count function
- [source,console]
- --------------------------------------------------
- PUT _ml/anomaly_detectors/example4
- {
- "analysis_config": {
- "summary_count_field_name" : "events_per_min",
- "detectors": [{
- "function" : "count"
- }]
- },
- "data_description": {
- "time_field":"timestamp",
- "time_format": "epoch_ms"
- }
- }
- --------------------------------------------------
- // TEST[skip:needs-licence]
- If you are analyzing an aggregated `events_per_min` field, do not use a sum
- function (for example, `sum(events_per_min)`). Instead, use the count function
- and the `summary_count_field_name` property. For more information, see
- <<ml-configuring-aggregation>>.
- [float]
- [[ml-nonzero-count]]
- == Non_zero_count, high_non_zero_count, low_non_zero_count
- The `non_zero_count` function detects anomalies when the number of events in a
- bucket is anomalous, but it ignores cases where the bucket count is zero. Use
- this function if you know your data is sparse or has gaps and the gaps are not
- important.
- The `high_non_zero_count` function detects anomalies when the number of events
- in a bucket is unusually high and it ignores cases where the bucket count is
- zero.
- The `low_non_zero_count` function detects anomalies when the number of events in
- a bucket is unusually low and it ignores cases where the bucket count is zero.
- These functions support the following properties:
- * `by_field_name` (optional)
- * `partition_field_name` (optional)
- For more information about those properties, see the
- {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
- For example, if you have the following number of events per bucket:
- ========================================
- 1,22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,43,31,0,0,0,0,0,0,0,0,0,0,0,0,2,1
- ========================================
- The `non_zero_count` function models only the following data:
- ========================================
- 1,22,2,43,31,2,1
- ========================================
- .Example 5: Analyzing signatures with the high_non_zero_count function
- [source,console]
- --------------------------------------------------
- PUT _ml/anomaly_detectors/example5
- {
- "analysis_config": {
- "detectors": [{
- "function" : "high_non_zero_count",
- "by_field_name" : "signaturename"
- }]
- },
- "data_description": {
- "time_field":"timestamp",
- "time_format": "epoch_ms"
- }
- }
- --------------------------------------------------
- // TEST[skip:needs-licence]
- If you use this `high_non_zero_count` function in a detector in your
- {anomaly-job}, it models the count of events for the `signaturename` field. It
- ignores any buckets where the count is zero and detects when a `signaturename`
- value has an unusually high count of events compared to its past behavior.
- NOTE: Population analysis (using an `over_field_name` property value) is not
- supported for the `non_zero_count`, `high_non_zero_count`, and
- `low_non_zero_count` functions. If you want to do population analysis and your
- data is sparse, use the `count` functions, which are optimized for that scenario.
- [float]
- [[ml-distinct-count]]
- == Distinct_count, high_distinct_count, low_distinct_count
- The `distinct_count` function detects anomalies where the number of distinct
- values in one field is unusual.
- The `high_distinct_count` function detects unusually high numbers of distinct
- values in one field.
- The `low_distinct_count` function detects unusually low numbers of distinct
- values in one field.
- These functions support the following properties:
- * `field_name` (required)
- * `by_field_name` (optional)
- * `over_field_name` (optional)
- * `partition_field_name` (optional)
- For more information about those properties, see the
- {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
- .Example 6: Analyzing users with the distinct_count function
- [source,console]
- --------------------------------------------------
- PUT _ml/anomaly_detectors/example6
- {
- "analysis_config": {
- "detectors": [{
- "function" : "distinct_count",
- "field_name" : "user"
- }]
- },
- "data_description": {
- "time_field":"timestamp",
- "time_format": "epoch_ms"
- }
- }
- --------------------------------------------------
- // TEST[skip:needs-licence]
- This `distinct_count` function detects when a system has an unusual number
- of logged in users. When you use this function in a detector in your
- {anomaly-job}, it models the distinct count of users. It also detects when the
- distinct number of users is unusual compared to the past.
- .Example 7: Analyzing ports with the high_distinct_count function
- [source,console]
- --------------------------------------------------
- PUT _ml/anomaly_detectors/example7
- {
- "analysis_config": {
- "detectors": [{
- "function" : "high_distinct_count",
- "field_name" : "dst_port",
- "over_field_name": "src_ip"
- }]
- },
- "data_description": {
- "time_field":"timestamp",
- "time_format": "epoch_ms"
- }
- }
- --------------------------------------------------
- // TEST[skip:needs-licence]
- This example detects instances of port scanning. When you use this function in a
- detector in your {anomaly-job}, it models the distinct count of ports. It also
- detects the `src_ip` values that connect to an unusually high number of different
- `dst_ports` values compared to other `src_ip` values.
|