123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237 |
- [role="xpack"]
- [[ml-configuring-detector-custom-rules]]
- === Customizing detectors with custom rules
- <<ml-rules,Custom rules>> enable you to change the behavior of anomaly
- detectors based on domain-specific knowledge.
- Custom rules describe _when_ a detector should take a certain _action_ instead
- of following its default behavior. To specify the _when_ a rule uses
- a `scope` and `conditions`. You can think of `scope` as the categorical
- specification of a rule, while `conditions` are the numerical part.
- A rule can have a scope, one or more conditions, or a combination of
- scope and conditions.
- Let us see how those can be configured by examples.
- ==== Specifying custom rule scope
- Let us assume we are configuring a job in order to detect DNS data exfiltration.
- Our data contain fields "subdomain" and "highest_registered_domain".
- We can use a detector that looks like `high_info_content(subdomain) over highest_registered_domain`.
- If we run such a job it is possible that we discover a lot of anomalies on
- frequently used domains that we have reasons to trust. As security analysts, we
- are not interested in such anomalies. Ideally, we could instruct the detector to
- skip results for domains that we consider safe. Using a rule with a scope allows
- us to achieve this.
- First, we need to create a list of our safe domains. Those lists are called
- _filters_ in {ml}. Filters can be shared across jobs.
- We create our filter using the {ref}/ml-put-filter.html[put filter API]:
- [source,js]
- ----------------------------------
- PUT _ml/filters/safe_domains
- {
- "description": "Our list of safe domains",
- "items": ["safe.com", "trusted.com"]
- }
- ----------------------------------
- // CONSOLE
- // TEST[skip:needs-licence]
- Now, we can create our job specifying a scope that uses the `safe_domains`
- filter for the `highest_registered_domain` field:
- [source,js]
- ----------------------------------
- PUT _ml/anomaly_detectors/dns_exfiltration_with_rule
- {
- "analysis_config" : {
- "bucket_span":"5m",
- "detectors" :[{
- "function":"high_info_content",
- "field_name": "subdomain",
- "over_field_name": "highest_registered_domain",
- "custom_rules": [{
- "actions": ["skip_result"],
- "scope": {
- "highest_registered_domain": {
- "filter_id": "safe_domains",
- "filter_type": "include"
- }
- }
- }]
- }]
- },
- "data_description" : {
- "time_field":"timestamp"
- }
- }
- ----------------------------------
- // CONSOLE
- // TEST[skip:needs-licence]
- As time advances and we see more data and more results, we might encounter new
- domains that we want to add in the filter. We can do that by using the
- {ref}/ml-update-filter.html[update filter API]:
- [source,js]
- ----------------------------------
- POST _ml/filters/safe_domains/_update
- {
- "add_items": ["another-safe.com"]
- }
- ----------------------------------
- // CONSOLE
- // TEST[skip:setup:ml_filter_safe_domains]
- Note that we can use any of the `partition_field_name`, `over_field_name`, or
- `by_field_name` fields in the `scope`.
- In the following example we scope multiple fields:
- [source,js]
- ----------------------------------
- PUT _ml/anomaly_detectors/scoping_multiple_fields
- {
- "analysis_config" : {
- "bucket_span":"5m",
- "detectors" :[{
- "function":"count",
- "partition_field_name": "my_partition",
- "over_field_name": "my_over",
- "by_field_name": "my_by",
- "custom_rules": [{
- "actions": ["skip_result"],
- "scope": {
- "my_partition": {
- "filter_id": "filter_1"
- },
- "my_over": {
- "filter_id": "filter_2"
- },
- "my_by": {
- "filter_id": "filter_3"
- }
- }
- }]
- }]
- },
- "data_description" : {
- "time_field":"timestamp"
- }
- }
- ----------------------------------
- // CONSOLE
- // TEST[skip:needs-licence]
- Such a detector will skip results when the values of all 3 scoped fields
- are included in the referenced filters.
- ==== Specifying custom rule conditions
- Imagine a detector that looks for anomalies in CPU utilization.
- Given a machine that is idle for long enough, small movement in CPU could
- result in anomalous results where the `actual` value is quite small, for
- example, 0.02. Given our knowledge about how CPU utilization behaves we might
- determine that anomalies with such small actual values are not interesting for
- investigation.
- Let us now configure a job with a rule that will skip results where CPU
- utilization is less than 0.20.
- [source,js]
- ----------------------------------
- PUT _ml/anomaly_detectors/cpu_with_rule
- {
- "analysis_config" : {
- "bucket_span":"5m",
- "detectors" :[{
- "function":"high_mean",
- "field_name": "cpu_utilization",
- "custom_rules": [{
- "actions": ["skip_result"],
- "conditions": [
- {
- "applies_to": "actual",
- "operator": "lt",
- "value": 0.20
- }
- ]
- }]
- }]
- },
- "data_description" : {
- "time_field":"timestamp"
- }
- }
- ----------------------------------
- // CONSOLE
- // TEST[skip:needs-licence]
- When there are multiple conditions they are combined with a logical `and`.
- This is useful when we want the rule to apply to a range. We simply create
- a rule with two conditions, one for each end of the desired range.
- Here is an example where a count detector will skip results when the count
- is greater than 30 and less than 50:
- [source,js]
- ----------------------------------
- PUT _ml/anomaly_detectors/rule_with_range
- {
- "analysis_config" : {
- "bucket_span":"5m",
- "detectors" :[{
- "function":"count",
- "custom_rules": [{
- "actions": ["skip_result"],
- "conditions": [
- {
- "applies_to": "actual",
- "operator": "gt",
- "value": 30
- },
- {
- "applies_to": "actual",
- "operator": "lt",
- "value": 50
- }
- ]
- }]
- }]
- },
- "data_description" : {
- "time_field":"timestamp"
- }
- }
- ----------------------------------
- // CONSOLE
- // TEST[skip:needs-licence]
- ==== Custom rules in the life-cycle of a job
- Custom rules only affect results created after the rules were applied.
- Let us imagine that we have configured a job and it has been running
- for some time. After observing its results we decide that we can employ
- rules in order to get rid of some uninteresting results. We can use
- the {ref}/ml-update-job.html[update job API] to do so. However, the rule we
- added will only be in effect for any results created from the moment we added
- the rule onwards. Past results will remain unaffected.
- ==== Using custom rules VS filtering data
- It might appear like using rules is just another way of filtering the data
- that feeds into a job. For example, a rule that skips results when the
- partition field value is in a filter sounds equivalent to having a query
- that filters out such documents. But it is not. There is a fundamental
- difference. When the data is filtered before reaching a job it is as if they
- never existed for the job. With rules, the data still reaches the job and
- affects its behavior (depending on the rule actions).
- For example, a rule with the `skip_result` action means all data will still
- be modeled. On the other hand, a rule with the `skip_model_update` action means
- results will still be created even though the model will not be updated by
- data matched by a rule.
|