count.asciidoc 8.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286
  1. [role="xpack"]
  2. [[ml-count-functions]]
  3. === Count functions
  4. Count functions detect anomalies when the number of events in a bucket is
  5. anomalous.
  6. Use `non_zero_count` functions if your data is sparse and you want to ignore
  7. cases where the bucket count is zero.
  8. Use `distinct_count` functions to determine when the number of distinct values
  9. in one field is unusual, as opposed to the total count.
  10. Use high-sided functions if you want to monitor unusually high event rates.
  11. Use low-sided functions if you want to look at drops in event rate.
  12. The {ml-features} include the following count functions:
  13. * xref:ml-count[`count`, `high_count`, `low_count`]
  14. * xref:ml-nonzero-count[`non_zero_count`, `high_non_zero_count`, `low_non_zero_count`]
  15. * xref:ml-distinct-count[`distinct_count`, `high_distinct_count`, `low_distinct_count`]
  16. [float]
  17. [[ml-count]]
  18. ===== Count, high_count, low_count
  19. The `count` function detects anomalies when the number of events in a bucket is
  20. anomalous.
  21. The `high_count` function detects anomalies when the count of events in a
  22. bucket are unusually high.
  23. The `low_count` function detects anomalies when the count of events in a
  24. bucket are unusually low.
  25. These functions support the following properties:
  26. * `by_field_name` (optional)
  27. * `over_field_name` (optional)
  28. * `partition_field_name` (optional)
  29. For more information about those properties, see the
  30. {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
  31. .Example 1: Analyzing events with the count function
  32. [source,console]
  33. --------------------------------------------------
  34. PUT _ml/anomaly_detectors/example1
  35. {
  36. "analysis_config": {
  37. "detectors": [{
  38. "function" : "count"
  39. }]
  40. },
  41. "data_description": {
  42. "time_field":"timestamp",
  43. "time_format": "epoch_ms"
  44. }
  45. }
  46. --------------------------------------------------
  47. // TEST[skip:needs-licence]
  48. This example is probably the simplest possible analysis. It identifies
  49. time buckets during which the overall count of events is higher or lower than
  50. usual.
  51. When you use this function in a detector in your {anomaly-job}, it models the
  52. event rate and detects when the event rate is unusual compared to its past
  53. behavior.
  54. .Example 2: Analyzing errors with the high_count function
  55. [source,console]
  56. --------------------------------------------------
  57. PUT _ml/anomaly_detectors/example2
  58. {
  59. "analysis_config": {
  60. "detectors": [{
  61. "function" : "high_count",
  62. "by_field_name" : "error_code",
  63. "over_field_name": "user"
  64. }]
  65. },
  66. "data_description": {
  67. "time_field":"timestamp",
  68. "time_format": "epoch_ms"
  69. }
  70. }
  71. --------------------------------------------------
  72. // TEST[skip:needs-licence]
  73. If you use this `high_count` function in a detector in your {anomaly-job}, it
  74. models the event rate for each error code. It detects users that generate an
  75. unusually high count of error codes compared to other users.
  76. .Example 3: Analyzing status codes with the low_count function
  77. [source,console]
  78. --------------------------------------------------
  79. PUT _ml/anomaly_detectors/example3
  80. {
  81. "analysis_config": {
  82. "detectors": [{
  83. "function" : "low_count",
  84. "by_field_name" : "status_code"
  85. }]
  86. },
  87. "data_description": {
  88. "time_field":"timestamp",
  89. "time_format": "epoch_ms"
  90. }
  91. }
  92. --------------------------------------------------
  93. // TEST[skip:needs-licence]
  94. In this example, the function detects when the count of events for a
  95. status code is lower than usual.
  96. When you use this function in a detector in your {anomaly-job}, it models the
  97. event rate for each status code and detects when a status code has an unusually
  98. low count compared to its past behavior.
  99. .Example 4: Analyzing aggregated data with the count function
  100. [source,console]
  101. --------------------------------------------------
  102. PUT _ml/anomaly_detectors/example4
  103. {
  104. "analysis_config": {
  105. "summary_count_field_name" : "events_per_min",
  106. "detectors": [{
  107. "function" : "count"
  108. }]
  109. },
  110. "data_description": {
  111. "time_field":"timestamp",
  112. "time_format": "epoch_ms"
  113. }
  114. }
  115. --------------------------------------------------
  116. // TEST[skip:needs-licence]
  117. If you are analyzing an aggregated `events_per_min` field, do not use a sum
  118. function (for example, `sum(events_per_min)`). Instead, use the count function
  119. and the `summary_count_field_name` property. For more information, see
  120. <<ml-configuring-aggregation>>.
  121. [float]
  122. [[ml-nonzero-count]]
  123. ===== Non_zero_count, high_non_zero_count, low_non_zero_count
  124. The `non_zero_count` function detects anomalies when the number of events in a
  125. bucket is anomalous, but it ignores cases where the bucket count is zero. Use
  126. this function if you know your data is sparse or has gaps and the gaps are not
  127. important.
  128. The `high_non_zero_count` function detects anomalies when the number of events
  129. in a bucket is unusually high and it ignores cases where the bucket count is
  130. zero.
  131. The `low_non_zero_count` function detects anomalies when the number of events in
  132. a bucket is unusually low and it ignores cases where the bucket count is zero.
  133. These functions support the following properties:
  134. * `by_field_name` (optional)
  135. * `partition_field_name` (optional)
  136. For more information about those properties, see the
  137. {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
  138. For example, if you have the following number of events per bucket:
  139. ========================================
  140. 1,22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,43,31,0,0,0,0,0,0,0,0,0,0,0,0,2,1
  141. ========================================
  142. The `non_zero_count` function models only the following data:
  143. ========================================
  144. 1,22,2,43,31,2,1
  145. ========================================
  146. .Example 5: Analyzing signatures with the high_non_zero_count function
  147. [source,console]
  148. --------------------------------------------------
  149. PUT _ml/anomaly_detectors/example5
  150. {
  151. "analysis_config": {
  152. "detectors": [{
  153. "function" : "high_non_zero_count",
  154. "by_field_name" : "signaturename"
  155. }]
  156. },
  157. "data_description": {
  158. "time_field":"timestamp",
  159. "time_format": "epoch_ms"
  160. }
  161. }
  162. --------------------------------------------------
  163. // TEST[skip:needs-licence]
  164. If you use this `high_non_zero_count` function in a detector in your
  165. {anomaly-job}, it models the count of events for the `signaturename` field. It
  166. ignores any buckets where the count is zero and detects when a `signaturename`
  167. value has an unusually high count of events compared to its past behavior.
  168. NOTE: Population analysis (using an `over_field_name` property value) is not
  169. supported for the `non_zero_count`, `high_non_zero_count`, and
  170. `low_non_zero_count` functions. If you want to do population analysis and your
  171. data is sparse, use the `count` functions, which are optimized for that scenario.
  172. [float]
  173. [[ml-distinct-count]]
  174. ===== Distinct_count, high_distinct_count, low_distinct_count
  175. The `distinct_count` function detects anomalies where the number of distinct
  176. values in one field is unusual.
  177. The `high_distinct_count` function detects unusually high numbers of distinct
  178. values in one field.
  179. The `low_distinct_count` function detects unusually low numbers of distinct
  180. values in one field.
  181. These functions support the following properties:
  182. * `field_name` (required)
  183. * `by_field_name` (optional)
  184. * `over_field_name` (optional)
  185. * `partition_field_name` (optional)
  186. For more information about those properties, see the
  187. {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
  188. .Example 6: Analyzing users with the distinct_count function
  189. [source,console]
  190. --------------------------------------------------
  191. PUT _ml/anomaly_detectors/example6
  192. {
  193. "analysis_config": {
  194. "detectors": [{
  195. "function" : "distinct_count",
  196. "field_name" : "user"
  197. }]
  198. },
  199. "data_description": {
  200. "time_field":"timestamp",
  201. "time_format": "epoch_ms"
  202. }
  203. }
  204. --------------------------------------------------
  205. // TEST[skip:needs-licence]
  206. This `distinct_count` function detects when a system has an unusual number
  207. of logged in users. When you use this function in a detector in your
  208. {anomaly-job}, it models the distinct count of users. It also detects when the
  209. distinct number of users is unusual compared to the past.
  210. .Example 7: Analyzing ports with the high_distinct_count function
  211. [source,console]
  212. --------------------------------------------------
  213. PUT _ml/anomaly_detectors/example7
  214. {
  215. "analysis_config": {
  216. "detectors": [{
  217. "function" : "high_distinct_count",
  218. "field_name" : "dst_port",
  219. "over_field_name": "src_ip"
  220. }]
  221. },
  222. "data_description": {
  223. "time_field":"timestamp",
  224. "time_format": "epoch_ms"
  225. }
  226. }
  227. --------------------------------------------------
  228. // TEST[skip:needs-licence]
  229. This example detects instances of port scanning. When you use this function in a
  230. detector in your {anomaly-job}, it models the distinct count of ports. It also
  231. detects the `src_ip` values that connect to an unusually high number of different
  232. `dst_ports` values compared to other `src_ip` values.