| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238 | [role="xpack"][[ml-configuring-detector-custom-rules]]= Customizing detectors with custom rules<<ml-rules,Custom rules>> enable you to change the behavior of anomalydetectors based on domain-specific knowledge.Custom rules describe _when_ a detector should take a certain _action_ insteadof following its default behavior. To specify the _when_ a rule usesa `scope` and `conditions`. You can think of `scope` as the categoricalspecification of a rule, while `conditions` are the numerical part.A rule can have a scope, one or more conditions, or a combination ofscope and conditions. For the full list of specification details, see the{ref}/ml-put-job.html#put-customrules[`custom_rules` object] in the create{anomaly-jobs} API.[[ml-custom-rules-scope]]== Specifying custom rule scopeLet us assume we are configuring an {anomaly-job} in order to detect DNS dataexfiltration. Our data contain fields "subdomain" and "highest_registered_domain".We can use a detector that looks like`high_info_content(subdomain) over highest_registered_domain`. If we run such ajob, it is possible that we discover a lot of anomalies on frequently useddomains that we have reasons to trust. As security analysts, we are notinterested in such anomalies. Ideally, we could instruct the detector to skipresults for domains that we consider safe. Using a rule with a scope allows usto achieve this.First, we need to create a list of our safe domains. Those lists are called _filters_ in {ml}. Filters can be shared across {anomaly-jobs}.You can create a filter in **Anomaly Detection > Settings > Filter Lists** in {kib} or by using the {ref}/ml-put-filter.html[put filter API]:[source,console]----------------------------------PUT _ml/filters/safe_domains{  "description": "Our list of safe domains",  "items": ["safe.com", "trusted.com"]}----------------------------------// TEST[skip:needs-licence]Now, we can create our {anomaly-job} specifying a scope that uses the`safe_domains` filter for the `highest_registered_domain` field:[source,console]----------------------------------PUT _ml/anomaly_detectors/dns_exfiltration_with_rule{  "analysis_config" : {    "bucket_span":"5m",    "detectors" :[{      "function":"high_info_content",      "field_name": "subdomain",      "over_field_name": "highest_registered_domain",      "custom_rules": [{        "actions": ["skip_result"],        "scope": {          "highest_registered_domain": {            "filter_id": "safe_domains",            "filter_type": "include"          }        }      }]    }]  },  "data_description" : {    "time_field":"timestamp"  }}----------------------------------// TEST[skip:needs-licence]As time advances and we see more data and more results, we might encounter new domains that we want to add in the filter. We can do that in the**Anomaly Detection > Settings > Filter Lists** in {kib} or by using the {ref}/ml-update-filter.html[update filter API]:[source,console]----------------------------------POST _ml/filters/safe_domains/_update{  "add_items": ["another-safe.com"]}----------------------------------// TEST[skip:setup:ml_filter_safe_domains]Note that we can use any of the `partition_field_name`, `over_field_name`, or `by_field_name` fields in the `scope`.In the following example we scope multiple fields:[source,console]----------------------------------PUT _ml/anomaly_detectors/scoping_multiple_fields{  "analysis_config" : {    "bucket_span":"5m",    "detectors" :[{      "function":"count",      "partition_field_name": "my_partition",      "over_field_name": "my_over",      "by_field_name": "my_by",      "custom_rules": [{        "actions": ["skip_result"],        "scope": {          "my_partition": {            "filter_id": "filter_1"          },          "my_over": {            "filter_id": "filter_2"          },          "my_by": {            "filter_id": "filter_3"          }        }      }]    }]  },  "data_description" : {    "time_field":"timestamp"  }}----------------------------------// TEST[skip:needs-licence]Such a detector will skip results when the values of all 3 scoped fieldsare included in the referenced filters.[[ml-custom-rules-conditions]]== Specifying custom rule conditionsImagine a detector that looks for anomalies in CPU utilization.Given a machine that is idle for long enough, small movement in CPU couldresult in anomalous results where the `actual` value is quite small, for example, 0.02. Given our knowledge about how CPU utilization behaves we might determine that anomalies with such small actual values are not interesting for investigation.Let us now configure an {anomaly-job} with a rule that will skip results whereCPU utilization is less than 0.20.[source,console]----------------------------------PUT _ml/anomaly_detectors/cpu_with_rule{  "analysis_config" : {    "bucket_span":"5m",    "detectors" :[{      "function":"high_mean",      "field_name": "cpu_utilization",      "custom_rules": [{        "actions": ["skip_result"],        "conditions": [          {            "applies_to": "actual",            "operator": "lt",            "value": 0.20          }        ]      }]    }]  },  "data_description" : {    "time_field":"timestamp"  }}----------------------------------// TEST[skip:needs-licence]When there are multiple conditions they are combined with a logical `and`.This is useful when we want the rule to apply to a range. We simply createa rule with two conditions, one for each end of the desired range.Here is an example where a count detector will skip results when the countis greater than 30 and less than 50:[source,console]----------------------------------PUT _ml/anomaly_detectors/rule_with_range{  "analysis_config" : {    "bucket_span":"5m",    "detectors" :[{      "function":"count",      "custom_rules": [{        "actions": ["skip_result"],        "conditions": [          {            "applies_to": "actual",            "operator": "gt",            "value": 30          },          {            "applies_to": "actual",            "operator": "lt",            "value": 50          }        ]      }]    }]  },  "data_description" : {    "time_field":"timestamp"  }}----------------------------------// TEST[skip:needs-licence][[ml-custom-rules-lifecycle]]== Custom rules in the lifecycle of a jobCustom rules only affect results created after the rules were applied.Let us imagine that we have configured an {anomaly-job} and it has been runningfor some time. After observing its results we decide that we can employrules in order to get rid of some uninteresting results. We can usethe {ref}/ml-update-job.html[update {anomaly-job} API] to do so. However, therule we added will only be in effect for any results created from the moment weadded the rule onwards. Past results will remain unaffected.[[ml-custom-rules-filtering]]== Using custom rules vs. filtering dataIt might appear like using rules is just another way of filtering the datathat feeds into an {anomaly-job}. For example, a rule that skips results whenthe partition field value is in a filter sounds equivalent to having a querythat filters out such documents. But it is not. There is a fundamentaldifference. When the data is filtered before reaching a job it is as if theynever existed for the job. With rules, the data still reaches the job andaffects its behavior (depending on the rule actions).For example, a rule with the `skip_result` action means all data will stillbe modeled. On the other hand, a rule with the `skip_model_update` action meansresults will still be created even though the model will not be updated bydata matched by a rule.
 |