count.asciidoc 8.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293
  1. [role="xpack"]
  2. [[ml-count-functions]]
  3. === Count functions
  4. Count functions detect anomalies when the number of events in a bucket is
  5. anomalous.
  6. Use `non_zero_count` functions if your data is sparse and you want to ignore
  7. cases where the bucket count is zero.
  8. Use `distinct_count` functions to determine when the number of distinct values
  9. in one field is unusual, as opposed to the total count.
  10. Use high-sided functions if you want to monitor unusually high event rates.
  11. Use low-sided functions if you want to look at drops in event rate.
  12. The {ml-features} include the following count functions:
  13. * xref:ml-count[`count`, `high_count`, `low_count`]
  14. * xref:ml-nonzero-count[`non_zero_count`, `high_non_zero_count`, `low_non_zero_count`]
  15. * xref:ml-distinct-count[`distinct_count`, `high_distinct_count`, `low_distinct_count`]
  16. [float]
  17. [[ml-count]]
  18. ===== Count, high_count, low_count
  19. The `count` function detects anomalies when the number of events in a bucket is
  20. anomalous.
  21. The `high_count` function detects anomalies when the count of events in a
  22. bucket are unusually high.
  23. The `low_count` function detects anomalies when the count of events in a
  24. bucket are unusually low.
  25. These functions support the following properties:
  26. * `by_field_name` (optional)
  27. * `over_field_name` (optional)
  28. * `partition_field_name` (optional)
  29. For more information about those properties,
  30. see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
  31. .Example 1: Analyzing events with the count function
  32. [source,js]
  33. --------------------------------------------------
  34. PUT _ml/anomaly_detectors/example1
  35. {
  36. "analysis_config": {
  37. "detectors": [{
  38. "function" : "count"
  39. }]
  40. },
  41. "data_description": {
  42. "time_field":"timestamp",
  43. "time_format": "epoch_ms"
  44. }
  45. }
  46. --------------------------------------------------
  47. // CONSOLE
  48. // TEST[skip:needs-licence]
  49. This example is probably the simplest possible analysis. It identifies
  50. time buckets during which the overall count of events is higher or lower than
  51. usual.
  52. When you use this function in a detector in your {anomaly-job}, it models the
  53. event rate and detects when the event rate is unusual compared to its past
  54. behavior.
  55. .Example 2: Analyzing errors with the high_count function
  56. [source,js]
  57. --------------------------------------------------
  58. PUT _ml/anomaly_detectors/example2
  59. {
  60. "analysis_config": {
  61. "detectors": [{
  62. "function" : "high_count",
  63. "by_field_name" : "error_code",
  64. "over_field_name": "user"
  65. }]
  66. },
  67. "data_description": {
  68. "time_field":"timestamp",
  69. "time_format": "epoch_ms"
  70. }
  71. }
  72. --------------------------------------------------
  73. // CONSOLE
  74. // TEST[skip:needs-licence]
  75. If you use this `high_count` function in a detector in your {anomaly-job}, it
  76. models the event rate for each error code. It detects users that generate an
  77. unusually high count of error codes compared to other users.
  78. .Example 3: Analyzing status codes with the low_count function
  79. [source,js]
  80. --------------------------------------------------
  81. PUT _ml/anomaly_detectors/example3
  82. {
  83. "analysis_config": {
  84. "detectors": [{
  85. "function" : "low_count",
  86. "by_field_name" : "status_code"
  87. }]
  88. },
  89. "data_description": {
  90. "time_field":"timestamp",
  91. "time_format": "epoch_ms"
  92. }
  93. }
  94. --------------------------------------------------
  95. // CONSOLE
  96. // TEST[skip:needs-licence]
  97. In this example, the function detects when the count of events for a
  98. status code is lower than usual.
  99. When you use this function in a detector in your {anomaly-job}, it models the
  100. event rate for each status code and detects when a status code has an unusually
  101. low count compared to its past behavior.
  102. .Example 4: Analyzing aggregated data with the count function
  103. [source,js]
  104. --------------------------------------------------
  105. PUT _ml/anomaly_detectors/example4
  106. {
  107. "analysis_config": {
  108. "summary_count_field_name" : "events_per_min",
  109. "detectors": [{
  110. "function" : "count"
  111. }]
  112. },
  113. "data_description": {
  114. "time_field":"timestamp",
  115. "time_format": "epoch_ms"
  116. }
  117. }
  118. --------------------------------------------------
  119. // CONSOLE
  120. // TEST[skip:needs-licence]
  121. If you are analyzing an aggregated `events_per_min` field, do not use a sum
  122. function (for example, `sum(events_per_min)`). Instead, use the count function
  123. and the `summary_count_field_name` property. For more information, see
  124. <<ml-configuring-aggregation>>.
  125. [float]
  126. [[ml-nonzero-count]]
  127. ===== Non_zero_count, high_non_zero_count, low_non_zero_count
  128. The `non_zero_count` function detects anomalies when the number of events in a
  129. bucket is anomalous, but it ignores cases where the bucket count is zero. Use
  130. this function if you know your data is sparse or has gaps and the gaps are not
  131. important.
  132. The `high_non_zero_count` function detects anomalies when the number of events
  133. in a bucket is unusually high and it ignores cases where the bucket count is
  134. zero.
  135. The `low_non_zero_count` function detects anomalies when the number of events in
  136. a bucket is unusually low and it ignores cases where the bucket count is zero.
  137. These functions support the following properties:
  138. * `by_field_name` (optional)
  139. * `partition_field_name` (optional)
  140. For more information about those properties,
  141. see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
  142. For example, if you have the following number of events per bucket:
  143. ========================================
  144. 1,22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,43,31,0,0,0,0,0,0,0,0,0,0,0,0,2,1
  145. ========================================
  146. The `non_zero_count` function models only the following data:
  147. ========================================
  148. 1,22,2,43,31,2,1
  149. ========================================
  150. .Example 5: Analyzing signatures with the high_non_zero_count function
  151. [source,js]
  152. --------------------------------------------------
  153. PUT _ml/anomaly_detectors/example5
  154. {
  155. "analysis_config": {
  156. "detectors": [{
  157. "function" : "high_non_zero_count",
  158. "by_field_name" : "signaturename"
  159. }]
  160. },
  161. "data_description": {
  162. "time_field":"timestamp",
  163. "time_format": "epoch_ms"
  164. }
  165. }
  166. --------------------------------------------------
  167. // CONSOLE
  168. // TEST[skip:needs-licence]
  169. If you use this `high_non_zero_count` function in a detector in your
  170. {anomaly-job}, it models the count of events for the `signaturename` field. It
  171. ignores any buckets where the count is zero and detects when a `signaturename`
  172. value has an unusually high count of events compared to its past behavior.
  173. NOTE: Population analysis (using an `over_field_name` property value) is not
  174. supported for the `non_zero_count`, `high_non_zero_count`, and
  175. `low_non_zero_count` functions. If you want to do population analysis and your
  176. data is sparse, use the `count` functions, which are optimized for that scenario.
  177. [float]
  178. [[ml-distinct-count]]
  179. ===== Distinct_count, high_distinct_count, low_distinct_count
  180. The `distinct_count` function detects anomalies where the number of distinct
  181. values in one field is unusual.
  182. The `high_distinct_count` function detects unusually high numbers of distinct
  183. values in one field.
  184. The `low_distinct_count` function detects unusually low numbers of distinct
  185. values in one field.
  186. These functions support the following properties:
  187. * `field_name` (required)
  188. * `by_field_name` (optional)
  189. * `over_field_name` (optional)
  190. * `partition_field_name` (optional)
  191. For more information about those properties,
  192. see {ref}/ml-job-resource.html#ml-detectorconfig[Detector configuration objects].
  193. .Example 6: Analyzing users with the distinct_count function
  194. [source,js]
  195. --------------------------------------------------
  196. PUT _ml/anomaly_detectors/example6
  197. {
  198. "analysis_config": {
  199. "detectors": [{
  200. "function" : "distinct_count",
  201. "field_name" : "user"
  202. }]
  203. },
  204. "data_description": {
  205. "time_field":"timestamp",
  206. "time_format": "epoch_ms"
  207. }
  208. }
  209. --------------------------------------------------
  210. // CONSOLE
  211. // TEST[skip:needs-licence]
  212. This `distinct_count` function detects when a system has an unusual number
  213. of logged in users. When you use this function in a detector in your
  214. {anomaly-job}, it models the distinct count of users. It also detects when the
  215. distinct number of users is unusual compared to the past.
  216. .Example 7: Analyzing ports with the high_distinct_count function
  217. [source,js]
  218. --------------------------------------------------
  219. PUT _ml/anomaly_detectors/example7
  220. {
  221. "analysis_config": {
  222. "detectors": [{
  223. "function" : "high_distinct_count",
  224. "field_name" : "dst_port",
  225. "over_field_name": "src_ip"
  226. }]
  227. },
  228. "data_description": {
  229. "time_field":"timestamp",
  230. "time_format": "epoch_ms"
  231. }
  232. }
  233. --------------------------------------------------
  234. // CONSOLE
  235. // TEST[skip:needs-licence]
  236. This example detects instances of port scanning. When you use this function in a
  237. detector in your {anomaly-job}, it models the distinct count of ports. It also
  238. detects the `src_ip` values that connect to an unusually high number of different
  239. `dst_ports` values compared to other `src_ip` values.