1
0

ml-info-functions.asciidoc 3.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
  1. [[ml-info-functions]]
  2. = Information Content Functions
  3. The information content functions detect anomalies in the amount of information
  4. that is contained in strings within a bucket. These functions can be used as
  5. a more sophisticated method to identify incidences of data exfiltration or
  6. C2C activity, when analyzing the size in bytes of the data might not be sufficient.
  7. The {ml-features} include the following information content functions:
  8. * `info_content`, `high_info_content`, `low_info_content`
  9. [discrete]
  10. [[ml-info-content]]
  11. == Info_content, High_info_content, Low_info_content
  12. The `info_content` function detects anomalies in the amount of information that
  13. is contained in strings in a bucket.
  14. If you want to monitor for unusually high amounts of information,
  15. use `high_info_content`.
  16. If want to look at drops in information content, use `low_info_content`.
  17. These functions support the following properties:
  18. * `field_name` (required)
  19. * `by_field_name` (optional)
  20. * `over_field_name` (optional)
  21. * `partition_field_name` (optional)
  22. For more information about those properties, see the
  23. {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
  24. .Example 1: Analyzing subdomain strings with the info_content function
  25. [source,js]
  26. --------------------------------------------------
  27. {
  28. "function" : "info_content",
  29. "field_name" : "subdomain",
  30. "over_field_name" : "highest_registered_domain"
  31. }
  32. --------------------------------------------------
  33. // NOTCONSOLE
  34. If you use this `info_content` function in a detector in your {anomaly-job}, it
  35. models information that is present in the `subdomain` string. It detects
  36. anomalies where the information content is unusual compared to the other
  37. `highest_registered_domain` values. An anomaly could indicate an abuse of the
  38. DNS protocol, such as malicious command and control activity.
  39. NOTE: In this example, both high and low values are considered anomalous.
  40. In many use cases, the `high_info_content` function is often a more appropriate
  41. choice.
  42. .Example 2: Analyzing query strings with the high_info_content function
  43. [source,js]
  44. --------------------------------------------------
  45. {
  46. "function" : "high_info_content",
  47. "field_name" : "query",
  48. "over_field_name" : "src_ip"
  49. }
  50. --------------------------------------------------
  51. // NOTCONSOLE
  52. If you use this `high_info_content` function in a detector in your {anomaly-job},
  53. it models information content that is held in the DNS query string. It detects
  54. `src_ip` values where the information content is unusually high compared to
  55. other `src_ip` values. This example is similar to the example for the
  56. `info_content` function, but it reports anomalies only where the amount of
  57. information content is higher than expected.
  58. .Example 3: Analyzing message strings with the low_info_content function
  59. [source,js]
  60. --------------------------------------------------
  61. {
  62. "function" : "low_info_content",
  63. "field_name" : "message",
  64. "by_field_name" : "logfilename"
  65. }
  66. --------------------------------------------------
  67. // NOTCONSOLE
  68. If you use this `low_info_content` function in a detector in your {anomaly-job},
  69. it models information content that is present in the message string for each
  70. `logfilename`. It detects anomalies where the information content is low
  71. compared to its past behavior. For example, this function detects unusually low
  72. amounts of information in a collection of rolling log files. Low information
  73. might indicate that a process has entered an infinite loop or that logging
  74. features have been disabled.