ml-count-functions.asciidoc 8.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285
  1. ["appendix",role="exclude",id="ml-count-functions"]
  2. = Count functions
  3. Count functions detect anomalies when the number of events in a bucket is
  4. anomalous.
  5. Use `non_zero_count` functions if your data is sparse and you want to ignore
  6. cases where the bucket count is zero.
  7. Use `distinct_count` functions to determine when the number of distinct values
  8. in one field is unusual, as opposed to the total count.
  9. Use high-sided functions if you want to monitor unusually high event rates.
  10. Use low-sided functions if you want to look at drops in event rate.
  11. The {ml-features} include the following count functions:
  12. * xref:ml-count[`count`, `high_count`, `low_count`]
  13. * xref:ml-nonzero-count[`non_zero_count`, `high_non_zero_count`, `low_non_zero_count`]
  14. * xref:ml-distinct-count[`distinct_count`, `high_distinct_count`, `low_distinct_count`]
  15. [discrete]
  16. [[ml-count]]
  17. == Count, high_count, low_count
  18. The `count` function detects anomalies when the number of events in a bucket is
  19. anomalous.
  20. The `high_count` function detects anomalies when the count of events in a
  21. bucket are unusually high.
  22. The `low_count` function detects anomalies when the count of events in a
  23. bucket are unusually low.
  24. These functions support the following properties:
  25. * `by_field_name` (optional)
  26. * `over_field_name` (optional)
  27. * `partition_field_name` (optional)
  28. For more information about those properties, see the
  29. {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
  30. .Example 1: Analyzing events with the count function
  31. [source,console]
  32. --------------------------------------------------
  33. PUT _ml/anomaly_detectors/example1
  34. {
  35. "analysis_config": {
  36. "detectors": [{
  37. "function" : "count"
  38. }]
  39. },
  40. "data_description": {
  41. "time_field":"timestamp",
  42. "time_format": "epoch_ms"
  43. }
  44. }
  45. --------------------------------------------------
  46. // TEST[skip:needs-licence]
  47. This example is probably the simplest possible analysis. It identifies
  48. time buckets during which the overall count of events is higher or lower than
  49. usual.
  50. When you use this function in a detector in your {anomaly-job}, it models the
  51. event rate and detects when the event rate is unusual compared to its past
  52. behavior.
  53. .Example 2: Analyzing errors with the high_count function
  54. [source,console]
  55. --------------------------------------------------
  56. PUT _ml/anomaly_detectors/example2
  57. {
  58. "analysis_config": {
  59. "detectors": [{
  60. "function" : "high_count",
  61. "by_field_name" : "error_code",
  62. "over_field_name": "user"
  63. }]
  64. },
  65. "data_description": {
  66. "time_field":"timestamp",
  67. "time_format": "epoch_ms"
  68. }
  69. }
  70. --------------------------------------------------
  71. // TEST[skip:needs-licence]
  72. If you use this `high_count` function in a detector in your {anomaly-job}, it
  73. models the event rate for each error code. It detects users that generate an
  74. unusually high count of error codes compared to other users.
  75. .Example 3: Analyzing status codes with the low_count function
  76. [source,console]
  77. --------------------------------------------------
  78. PUT _ml/anomaly_detectors/example3
  79. {
  80. "analysis_config": {
  81. "detectors": [{
  82. "function" : "low_count",
  83. "by_field_name" : "status_code"
  84. }]
  85. },
  86. "data_description": {
  87. "time_field":"timestamp",
  88. "time_format": "epoch_ms"
  89. }
  90. }
  91. --------------------------------------------------
  92. // TEST[skip:needs-licence]
  93. In this example, the function detects when the count of events for a
  94. status code is lower than usual.
  95. When you use this function in a detector in your {anomaly-job}, it models the
  96. event rate for each status code and detects when a status code has an unusually
  97. low count compared to its past behavior.
  98. .Example 4: Analyzing aggregated data with the count function
  99. [source,console]
  100. --------------------------------------------------
  101. PUT _ml/anomaly_detectors/example4
  102. {
  103. "analysis_config": {
  104. "summary_count_field_name" : "events_per_min",
  105. "detectors": [{
  106. "function" : "count"
  107. }]
  108. },
  109. "data_description": {
  110. "time_field":"timestamp",
  111. "time_format": "epoch_ms"
  112. }
  113. }
  114. --------------------------------------------------
  115. // TEST[skip:needs-licence]
  116. If you are analyzing an aggregated `events_per_min` field, do not use a sum
  117. function (for example, `sum(events_per_min)`). Instead, use the count function
  118. and the `summary_count_field_name` property. For more information, see
  119. <<ml-configuring-aggregation>>.
  120. [discrete]
  121. [[ml-nonzero-count]]
  122. == Non_zero_count, high_non_zero_count, low_non_zero_count
  123. The `non_zero_count` function detects anomalies when the number of events in a
  124. bucket is anomalous, but it ignores cases where the bucket count is zero. Use
  125. this function if you know your data is sparse or has gaps and the gaps are not
  126. important.
  127. The `high_non_zero_count` function detects anomalies when the number of events
  128. in a bucket is unusually high and it ignores cases where the bucket count is
  129. zero.
  130. The `low_non_zero_count` function detects anomalies when the number of events in
  131. a bucket is unusually low and it ignores cases where the bucket count is zero.
  132. These functions support the following properties:
  133. * `by_field_name` (optional)
  134. * `partition_field_name` (optional)
  135. For more information about those properties, see the
  136. {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
  137. For example, if you have the following number of events per bucket:
  138. ========================================
  139. 1,22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,43,31,0,0,0,0,0,0,0,0,0,0,0,0,2,1
  140. ========================================
  141. The `non_zero_count` function models only the following data:
  142. ========================================
  143. 1,22,2,43,31,2,1
  144. ========================================
  145. .Example 5: Analyzing signatures with the high_non_zero_count function
  146. [source,console]
  147. --------------------------------------------------
  148. PUT _ml/anomaly_detectors/example5
  149. {
  150. "analysis_config": {
  151. "detectors": [{
  152. "function" : "high_non_zero_count",
  153. "by_field_name" : "signaturename"
  154. }]
  155. },
  156. "data_description": {
  157. "time_field":"timestamp",
  158. "time_format": "epoch_ms"
  159. }
  160. }
  161. --------------------------------------------------
  162. // TEST[skip:needs-licence]
  163. If you use this `high_non_zero_count` function in a detector in your
  164. {anomaly-job}, it models the count of events for the `signaturename` field. It
  165. ignores any buckets where the count is zero and detects when a `signaturename`
  166. value has an unusually high count of events compared to its past behavior.
  167. NOTE: Population analysis (using an `over_field_name` property value) is not
  168. supported for the `non_zero_count`, `high_non_zero_count`, and
  169. `low_non_zero_count` functions. If you want to do population analysis and your
  170. data is sparse, use the `count` functions, which are optimized for that scenario.
  171. [discrete]
  172. [[ml-distinct-count]]
  173. == Distinct_count, high_distinct_count, low_distinct_count
  174. The `distinct_count` function detects anomalies where the number of distinct
  175. values in one field is unusual.
  176. The `high_distinct_count` function detects unusually high numbers of distinct
  177. values in one field.
  178. The `low_distinct_count` function detects unusually low numbers of distinct
  179. values in one field.
  180. These functions support the following properties:
  181. * `field_name` (required)
  182. * `by_field_name` (optional)
  183. * `over_field_name` (optional)
  184. * `partition_field_name` (optional)
  185. For more information about those properties, see the
  186. {ref}/ml-put-job.html#ml-put-job-request-body[create {anomaly-jobs} API].
  187. .Example 6: Analyzing users with the distinct_count function
  188. [source,console]
  189. --------------------------------------------------
  190. PUT _ml/anomaly_detectors/example6
  191. {
  192. "analysis_config": {
  193. "detectors": [{
  194. "function" : "distinct_count",
  195. "field_name" : "user"
  196. }]
  197. },
  198. "data_description": {
  199. "time_field":"timestamp",
  200. "time_format": "epoch_ms"
  201. }
  202. }
  203. --------------------------------------------------
  204. // TEST[skip:needs-licence]
  205. This `distinct_count` function detects when a system has an unusual number
  206. of logged in users. When you use this function in a detector in your
  207. {anomaly-job}, it models the distinct count of users. It also detects when the
  208. distinct number of users is unusual compared to the past.
  209. .Example 7: Analyzing ports with the high_distinct_count function
  210. [source,console]
  211. --------------------------------------------------
  212. PUT _ml/anomaly_detectors/example7
  213. {
  214. "analysis_config": {
  215. "detectors": [{
  216. "function" : "high_distinct_count",
  217. "field_name" : "dst_port",
  218. "over_field_name": "src_ip"
  219. }]
  220. },
  221. "data_description": {
  222. "time_field":"timestamp",
  223. "time_format": "epoch_ms"
  224. }
  225. }
  226. --------------------------------------------------
  227. // TEST[skip:needs-licence]
  228. This example detects instances of port scanning. When you use this function in a
  229. detector in your {anomaly-job}, it models the distinct count of ports. It also
  230. detects the `src_ip` values that connect to an unusually high number of different
  231. `dst_ports` values compared to other `src_ip` values.