| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990 | [[ml-info-functions]]=== Information Content FunctionsThe information content functions detect anomalies in the amount of informationthat is contained in strings within a bucket. These functions can be used asa more sophisticated method to identify incidences of data exfiltration orC2C activity, when analyzing the size in bytes of the data might not be sufficient.The {ml-features} include the following information content functions:* `info_content`, `high_info_content`, `low_info_content`[float][[ml-info-content]]==== Info_content, High_info_content, Low_info_contentThe `info_content` function detects anomalies in the amount of information thatis contained in strings in a bucket.If you want to monitor for unusually high amounts of information,use `high_info_content`.If want to look at drops in information content, use `low_info_content`.These functions support the following properties:* `field_name` (required)* `by_field_name` (optional)* `over_field_name` (optional)* `partition_field_name` (optional)For more information about those properties, see{ref}/ml-job-resource.html#ml-detectorconfig[Detector Configuration Objects]..Example 1: Analyzing subdomain strings with the info_content function[source,js]--------------------------------------------------{  "function" : "info_content",  "field_name" : "subdomain",  "over_field_name" : "highest_registered_domain"}--------------------------------------------------// NOTCONSOLEIf you use this `info_content` function in a detector in your job, it modelsinformation that is present in the `subdomain` string. It detects anomalieswhere the information content is unusual compared to the other`highest_registered_domain` values. An anomaly could indicate an abuse of theDNS protocol, such as malicious command and control activity.NOTE: In this example, both high and low values are considered anomalous.In many use cases, the `high_info_content` function is often a more appropriatechoice..Example 2: Analyzing query strings with the high_info_content function[source,js]--------------------------------------------------{  "function" : "high_info_content",  "field_name" : "query",  "over_field_name" : "src_ip"}--------------------------------------------------// NOTCONSOLEIf you use this `high_info_content` function in a detector in your job, itmodels information content that is held in the DNS query string. It detects`src_ip` values where the information content is unusually high compared toother `src_ip` values. This example is similar to the example for the`info_content` function, but it reports anomalies only where the amount ofinformation content is higher than expected..Example 3: Analyzing message strings with the low_info_content function[source,js]--------------------------------------------------{  "function" : "low_info_content",  "field_name" : "message",  "by_field_name" : "logfilename"}--------------------------------------------------// NOTCONSOLEIf you use this `low_info_content` function in a detector in your job, it modelsinformation content that is present in the message string for each`logfilename`. It detects anomalies where the information content is lowcompared to its past behavior. For example, this function detects unusually lowamounts of information in a collection of rolling log files. Low informationmight indicate that a process has entered an infinite loop or that loggingfeatures have been disabled.
 |