rare.asciidoc 5.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
  1. [role="xpack"]
  2. [[ml-rare-functions]]
  3. === Rare functions
  4. The rare functions detect values that occur rarely in time or rarely for a
  5. population.
  6. The `rare` analysis detects anomalies according to the number of distinct rare
  7. values. This differs from `freq_rare`, which detects anomalies according to the
  8. number of times (frequency) rare values occur.
  9. [NOTE]
  10. ====
  11. * The `rare` and `freq_rare` functions should not be used in conjunction with
  12. `exclude_frequent`.
  13. * You cannot create forecasts for {anomaly-jobs} that contain `rare` or
  14. `freq_rare` functions.
  15. * You cannot add rules with conditions to detectors that use `rare` or
  16. `freq_rare` functions.
  17. * Shorter bucket spans (less than 1 hour, for example) are recommended when
  18. looking for rare events. The functions model whether something happens in a
  19. bucket at least once. With longer bucket spans, it is more likely that
  20. entities will be seen in a bucket and therefore they appear less rare.
  21. Picking the ideal the bucket span depends on the characteristics of the data
  22. with shorter bucket spans typically being measured in minutes, not hours.
  23. * To model rare data, a learning period of at least 20 buckets is required
  24. for typical data.
  25. ====
  26. The {ml-features} include the following rare functions:
  27. * <<ml-rare,`rare`>>
  28. * <<ml-freq-rare,`freq_rare`>>
  29. [float]
  30. [[ml-rare]]
  31. ==== Rare
  32. The `rare` function detects values that occur rarely in time or rarely for a
  33. population. It detects anomalies according to the number of distinct rare values.
  34. This function supports the following properties:
  35. * `by_field_name` (required)
  36. * `over_field_name` (optional)
  37. * `partition_field_name` (optional)
  38. For more information about those properties, see the
  39. {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
  40. .Example 1: Analyzing status codes with the rare function
  41. [source,js]
  42. --------------------------------------------------
  43. {
  44. "function" : "rare",
  45. "by_field_name" : "status"
  46. }
  47. --------------------------------------------------
  48. // NOTCONSOLE
  49. If you use this `rare` function in a detector in your {anomaly-job}, it detects
  50. values that are rare in time. It models status codes that occur over time and
  51. detects when rare status codes occur compared to the past. For example, you can
  52. detect status codes in a web access log that have never (or rarely) occurred
  53. before.
  54. .Example 2: Analyzing status codes in a population with the rare function
  55. [source,js]
  56. --------------------------------------------------
  57. {
  58. "function" : "rare",
  59. "by_field_name" : "status",
  60. "over_field_name" : "clientip"
  61. }
  62. --------------------------------------------------
  63. // NOTCONSOLE
  64. If you use this `rare` function in a detector in your {anomaly-job}, it detects
  65. values that are rare in a population. It models status code and client IP
  66. interactions that occur. It defines a rare status code as one that occurs for
  67. few client IP values compared to the population. It detects client IP values
  68. that experience one or more distinct rare status codes compared to the
  69. population. For example in a web access log, a `clientip` that experiences the
  70. highest number of different rare status codes compared to the population is
  71. regarded as highly anomalous. This analysis is based on the number of different
  72. status code values, not the count of occurrences.
  73. NOTE: To define a status code as rare the {ml-features} look at the number
  74. of distinct status codes that occur, not the number of times the status code
  75. occurs. If a single client IP experiences a single unique status code, this
  76. is rare, even if it occurs for that client IP in every bucket.
  77. [float]
  78. [[ml-freq-rare]]
  79. ==== Freq_rare
  80. The `freq_rare` function detects values that occur rarely for a population.
  81. It detects anomalies according to the number of times (frequency) that rare
  82. values occur.
  83. This function supports the following properties:
  84. * `by_field_name` (required)
  85. * `over_field_name` (required)
  86. * `partition_field_name` (optional)
  87. For more information about those properties, see the
  88. {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
  89. .Example 3: Analyzing URI values in a population with the freq_rare function
  90. [source,js]
  91. --------------------------------------------------
  92. {
  93. "function" : "freq_rare",
  94. "by_field_name" : "uri",
  95. "over_field_name" : "clientip"
  96. }
  97. --------------------------------------------------
  98. // NOTCONSOLE
  99. If you use this `freq_rare` function in a detector in your {anomaly-job}, it
  100. detects values that are frequently rare in a population. It models URI paths and
  101. client IP interactions that occur. It defines a rare URI path as one that is
  102. visited by few client IP values compared to the population. It detects the
  103. client IP values that experience many interactions with rare URI paths compared
  104. to the population. For example in a web access log, a client IP that visits
  105. one or more rare URI paths many times compared to the population is regarded as
  106. highly anomalous. This analysis is based on the count of interactions with rare
  107. URI paths, not the number of different URI path values.
  108. NOTE: Defining a URI path as rare happens the same way as you can see in the
  109. case of the status codes above: the analytics consider the number of distinct
  110. values that occur and not the number of times the URI path occurs. If a single
  111. client IP visits a single unique URI path, this is rare, even if it
  112. occurs for that client IP in every bucket.