resultsresource.asciidoc 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[ml-results-resource]]
  4. === Results resources
  5. Several different result types are created for each job. You can query anomaly
  6. results for _buckets_, _influencers_, and _records_ by using the results API.
  7. Summarized bucket results over multiple jobs can be queried as well; those
  8. results are called _overall buckets_.
  9. Results are written for each `bucket_span`. The timestamp for the results is the
  10. start of the bucket time interval.
  11. The results include scores, which are calculated for each anomaly result type and
  12. each bucket interval. These scores are aggregated in order to reduce noise, and
  13. normalized in order to identify and rank the most mathematically significant
  14. anomalies.
  15. Bucket results provide the top level, overall view of the job and are ideal for
  16. alerts. For example, the bucket results might indicate that at 16:05 the system
  17. was unusual. This information is a summary of all the anomalies, pinpointing
  18. when they occurred.
  19. Influencer results show which entities were anomalous and when. For example,
  20. the influencer results might indicate that at 16:05 `user_name: Bob` was unusual.
  21. This information is a summary of all the anomalies for each entity, so there
  22. can be a lot of these results. Once you have identified a notable bucket time,
  23. you can look to see which entities were significant.
  24. Record results provide details about what the individual anomaly was, when it
  25. occurred and which entity was involved. For example, the record results might
  26. indicate that at 16:05 Bob sent 837262434 bytes, when the typical value was
  27. 1067 bytes. Once you have identified a bucket time and perhaps a significant
  28. entity too, you can drill through to the record results in order to investigate
  29. the anomalous behavior.
  30. Categorization results contain the definitions of _categories_ that have been
  31. identified. These are only applicable for jobs that are configured to analyze
  32. unstructured log data using categorization. These results do not contain a
  33. timestamp or any calculated scores. For more information, see
  34. {xpack-ref}/ml-configuring-categories.html[Categorizing Log Messages].
  35. * <<ml-results-buckets,Buckets>>
  36. * <<ml-results-influencers,Influencers>>
  37. * <<ml-results-records,Records>>
  38. * <<ml-results-categories,Categories>>
  39. * <<ml-results-overall-buckets,Overall Buckets>>
  40. NOTE: All of these resources and properties are informational; you cannot
  41. change their values.
  42. [float]
  43. [[ml-results-buckets]]
  44. ==== Buckets
  45. Bucket results provide the top level, overall view of the job and are best for
  46. alerting.
  47. Each bucket has an `anomaly_score`, which is a statistically aggregated and
  48. normalized view of the combined anomalousness of all the record results within
  49. each bucket.
  50. One bucket result is written for each `bucket_span` for each job, even if it is
  51. not considered to be anomalous. If the bucket is not anomalous, it has an
  52. `anomaly_score` of zero.
  53. When you identify an anomalous bucket, you can investigate further by expanding
  54. the bucket resource to show the records as nested objects. Alternatively, you
  55. can access the records resource directly and filter by the date range.
  56. A bucket resource has the following properties:
  57. `anomaly_score`::
  58. (number) The maximum anomaly score, between 0-100, for any of the bucket
  59. influencers. This is an overall, rate-limited score for the job. All the
  60. anomaly records in the bucket contribute to this score. This value might be
  61. updated as new data is analyzed.
  62. `bucket_influencers`::
  63. (array) An array of bucket influencer objects.
  64. For more information, see <<ml-results-bucket-influencers,Bucket Influencers>>.
  65. `bucket_span`::
  66. (number) The length of the bucket in seconds.
  67. This value matches the `bucket_span` that is specified in the job.
  68. `event_count`::
  69. (number) The number of input data records processed in this bucket.
  70. `initial_anomaly_score`::
  71. (number) The maximum `anomaly_score` for any of the bucket influencers.
  72. This is the initial value that was calculated at the time the bucket was
  73. processed.
  74. `is_interim`::
  75. (boolean) If true, this is an interim result. In other words, the bucket
  76. results are calculated based on partial input data.
  77. `job_id`::
  78. (string) The unique identifier for the job that these results belong to.
  79. `processing_time_ms`::
  80. (number) The amount of time, in milliseconds, that it took to analyze the
  81. bucket contents and calculate results.
  82. `result_type`::
  83. (string) Internal. This value is always set to `bucket`.
  84. `timestamp`::
  85. (date) The start time of the bucket. This timestamp uniquely identifies the
  86. bucket. +
  87. NOTE: Events that occur exactly at the timestamp of the bucket are included in
  88. the results for the bucket.
  89. [float]
  90. [[ml-results-bucket-influencers]]
  91. ==== Bucket Influencers
  92. Bucket influencer results are available as nested objects contained within
  93. bucket results. These results are an aggregation for each type of influencer.
  94. For example, if both `client_ip` and `user_name` were specified as influencers,
  95. then you would be able to determine when the `client_ip` or `user_name` values
  96. were collectively anomalous.
  97. There is a built-in bucket influencer called `bucket_time` which is always
  98. available. This bucket influencer is the aggregation of all records in the
  99. bucket; it is not just limited to a type of influencer.
  100. NOTE: A bucket influencer is a type of influencer. For example, `client_ip` or
  101. `user_name` can be bucket influencers, whereas `192.168.88.2` and `Bob` are
  102. influencers.
  103. An bucket influencer object has the following properties:
  104. `anomaly_score`::
  105. (number) A normalized score between 0-100, which is calculated for each bucket
  106. influencer. This score might be updated as newer data is analyzed.
  107. `bucket_span`::
  108. (number) The length of the bucket in seconds. This value matches the `bucket_span`
  109. that is specified in the job.
  110. `initial_anomaly_score`::
  111. (number) The score between 0-100 for each bucket influencer. This score is
  112. the initial value that was calculated at the time the bucket was processed.
  113. `influencer_field_name`::
  114. (string) The field name of the influencer. For example `client_ip` or
  115. `user_name`.
  116. `influencer_field_value`::
  117. (string) The field value of the influencer. For example `192.168.88.2` or
  118. `Bob`.
  119. `is_interim`::
  120. (boolean) If true, this is an interim result. In other words, the bucket
  121. influencer results are calculated based on partial input data.
  122. `job_id`::
  123. (string) The unique identifier for the job that these results belong to.
  124. `probability`::
  125. (number) The probability that the bucket has this behavior, in the range 0
  126. to 1. For example, 0.0000109783. This value can be held to a high precision
  127. of over 300 decimal places, so the `anomaly_score` is provided as a
  128. human-readable and friendly interpretation of this.
  129. `raw_anomaly_score`::
  130. (number) Internal.
  131. `result_type`::
  132. (string) Internal. This value is always set to `bucket_influencer`.
  133. `timestamp`::
  134. (date) The start time of the bucket for which these results were calculated.
  135. [float]
  136. [[ml-results-influencers]]
  137. ==== Influencers
  138. Influencers are the entities that have contributed to, or are to blame for,
  139. the anomalies. Influencer results are available only if an
  140. `influencer_field_name` is specified in the job configuration.
  141. Influencers are given an `influencer_score`, which is calculated based on the
  142. anomalies that have occurred in each bucket interval. For jobs with more than
  143. one detector, this gives a powerful view of the most anomalous entities.
  144. For example, if you are analyzing unusual bytes sent and unusual domains
  145. visited and you specified `user_name` as the influencer, then an
  146. `influencer_score` for each anomalous user name is written per bucket. For
  147. example, if `user_name: Bob` had an `influencer_score` greater than 75, then
  148. `Bob` would be considered very anomalous during this time interval in one or
  149. both of those areas (unusual bytes sent or unusual domains visited).
  150. One influencer result is written per bucket for each influencer that is
  151. considered anomalous.
  152. When you identify an influencer with a high score, you can investigate further
  153. by accessing the records resource for that bucket and enumerating the anomaly
  154. records that contain the influencer.
  155. An influencer object has the following properties:
  156. `bucket_span`::
  157. (number) The length of the bucket in seconds. This value matches the `bucket_span`
  158. that is specified in the job.
  159. `influencer_score`::
  160. (number) A normalized score between 0-100, which is based on the probability
  161. of the influencer in this bucket aggregated across detectors. Unlike
  162. `initial_influencer_score`, this value will be updated by a re-normalization
  163. process as new data is analyzed.
  164. `initial_influencer_score`::
  165. (number) A normalized score between 0-100, which is based on the probability
  166. of the influencer aggregated across detectors. This is the initial value that
  167. was calculated at the time the bucket was processed.
  168. `influencer_field_name`::
  169. (string) The field name of the influencer.
  170. `influencer_field_value`::
  171. (string) The entity that influenced, contributed to, or was to blame for the
  172. anomaly.
  173. `is_interim`::
  174. (boolean) If true, this is an interim result. In other words, the influencer
  175. results are calculated based on partial input data.
  176. `job_id`::
  177. (string) The unique identifier for the job that these results belong to.
  178. `probability`::
  179. (number) The probability that the influencer has this behavior, in the range
  180. 0 to 1. For example, 0.0000109783. This value can be held to a high precision
  181. of over 300 decimal places, so the `influencer_score` is provided as a
  182. human-readable and friendly interpretation of this.
  183. // For example, 0.03 means 3%. This value is held to a high precision of over
  184. //300 decimal places. In scientific notation, a value of 3.24E-300 is highly
  185. //unlikely and therefore highly anomalous.
  186. `result_type`::
  187. (string) Internal. This value is always set to `influencer`.
  188. `timestamp`::
  189. (date) The start time of the bucket for which these results were calculated.
  190. NOTE: Additional influencer properties are added, depending on the fields being
  191. analyzed. For example, if it's analyzing `user_name` as an influencer, then a
  192. field `user_name` is added to the result document. This information enables you to
  193. filter the anomaly results more easily.
  194. [float]
  195. [[ml-results-records]]
  196. ==== Records
  197. Records contain the detailed analytical results. They describe the anomalous
  198. activity that has been identified in the input data based on the detector
  199. configuration.
  200. For example, if you are looking for unusually large data transfers, an anomaly
  201. record can identify the source IP address, the destination, the time window
  202. during which it occurred, the expected and actual size of the transfer, and the
  203. probability of this occurrence.
  204. There can be many anomaly records depending on the characteristics and size of
  205. the input data. In practice, there are often too many to be able to manually
  206. process them. The {ml-features} therefore perform a sophisticated
  207. aggregation of the anomaly records into buckets.
  208. The number of record results depends on the number of anomalies found in each
  209. bucket, which relates to the number of time series being modeled and the number of
  210. detectors.
  211. A record object has the following properties:
  212. `actual`::
  213. (array) The actual value for the bucket.
  214. `bucket_span`::
  215. (number) The length of the bucket in seconds.
  216. This value matches the `bucket_span` that is specified in the job.
  217. `by_field_name`::
  218. (string) The name of the analyzed field. This value is present only if
  219. it is specified in the detector. For example, `client_ip`.
  220. `by_field_value`::
  221. (string) The value of `by_field_name`. This value is present only if
  222. it is specified in the detector. For example, `192.168.66.2`.
  223. `causes`::
  224. (array) For population analysis, an over field must be specified in the
  225. detector. This property contains an array of anomaly records that are the
  226. causes for the anomaly that has been identified for the over field. If no
  227. over fields exist, this field is not present. This sub-resource contains
  228. the most anomalous records for the `over_field_name`. For scalability reasons,
  229. a maximum of the 10 most significant causes of the anomaly are returned. As
  230. part of the core analytical modeling, these low-level anomaly records are
  231. aggregated for their parent over field record. The causes resource contains
  232. similar elements to the record resource, namely `actual`, `typical`,
  233. `*_field_name` and `*_field_value`. Probability and scores are not applicable
  234. to causes.
  235. `detector_index`::
  236. (number) A unique identifier for the detector.
  237. `field_name`::
  238. (string) Certain functions require a field to operate on, for example, `sum()`.
  239. For those functions, this value is the name of the field to be analyzed.
  240. `function`::
  241. (string) The function in which the anomaly occurs, as specified in the
  242. detector configuration. For example, `max`.
  243. `function_description`::
  244. (string) The description of the function in which the anomaly occurs, as
  245. specified in the detector configuration.
  246. `influencers`::
  247. (array) If `influencers` was specified in the detector configuration, then
  248. this array contains influencers that contributed to or were to blame for an
  249. anomaly.
  250. `initial_record_score`::
  251. (number) A normalized score between 0-100, which is based on the
  252. probability of the anomalousness of this record. This is the initial value
  253. that was calculated at the time the bucket was processed.
  254. `is_interim`::
  255. (boolean) If true, this is an interim result. In other words, the anomaly
  256. record is calculated based on partial input data.
  257. `job_id`::
  258. (string) The unique identifier for the job that these results belong to.
  259. `over_field_name`::
  260. (string) The name of the over field that was used in the analysis. This value
  261. is present only if it was specified in the detector. Over fields are used
  262. in population analysis. For example, `user`.
  263. `over_field_value`::
  264. (string) The value of `over_field_name`. This value is present only if it
  265. was specified in the detector. For example, `Bob`.
  266. `partition_field_name`::
  267. (string) The name of the partition field that was used in the analysis. This
  268. value is present only if it was specified in the detector. For example,
  269. `region`.
  270. `partition_field_value`::
  271. (string) The value of `partition_field_name`. This value is present only if
  272. it was specified in the detector. For example, `us-east-1`.
  273. `probability`::
  274. (number) The probability of the individual anomaly occurring, in the range
  275. 0 to 1. For example, 0.0000772031. This value can be held to a high precision
  276. of over 300 decimal places, so the `record_score` is provided as a
  277. human-readable and friendly interpretation of this.
  278. //In scientific notation, a value of 3.24E-300 is highly unlikely and therefore
  279. //highly anomalous.
  280. `multi_bucket_impact`::
  281. (number) an indication of how strongly an anomaly is multi bucket or single bucket.
  282. The value is on a scale of -5 to +5 where -5 means the anomaly is purely single
  283. bucket and +5 means the anomaly is purely multi bucket.
  284. `record_score`::
  285. (number) A normalized score between 0-100, which is based on the probability
  286. of the anomalousness of this record. Unlike `initial_record_score`, this
  287. value will be updated by a re-normalization process as new data is analyzed.
  288. `result_type`::
  289. (string) Internal. This is always set to `record`.
  290. `timestamp`::
  291. (date) The start time of the bucket for which these results were calculated.
  292. `typical`::
  293. (array) The typical value for the bucket, according to analytical modeling.
  294. NOTE: Additional record properties are added, depending on the fields being
  295. analyzed. For example, if it's analyzing `hostname` as a _by field_, then a field
  296. `hostname` is added to the result document. This information enables you to
  297. filter the anomaly results more easily.
  298. [float]
  299. [[ml-results-categories]]
  300. ==== Categories
  301. When `categorization_field_name` is specified in the job configuration, it is
  302. possible to view the definitions of the resulting categories. A category
  303. definition describes the common terms matched and contains examples of matched
  304. values.
  305. The anomaly results from a categorization analysis are available as bucket,
  306. influencer, and record results. For example, the results might indicate that
  307. at 16:45 there was an unusual count of log message category 11. You can then
  308. examine the description and examples of that category.
  309. A category resource has the following properties:
  310. `category_id`::
  311. (unsigned integer) A unique identifier for the category.
  312. `examples`::
  313. (array) A list of examples of actual values that matched the category.
  314. `grok_pattern`::
  315. experimental[] (string) A Grok pattern that could be used in Logstash or an
  316. Ingest Pipeline to extract fields from messages that match the category. This
  317. field is experimental and may be changed or removed in a future release. The
  318. Grok patterns that are found are not optimal, but are often a good starting
  319. point for manual tweaking.
  320. `job_id`::
  321. (string) The unique identifier for the job that these results belong to.
  322. `max_matching_length`::
  323. (unsigned integer) The maximum length of the fields that matched the category.
  324. The value is increased by 10% to enable matching for similar fields that have
  325. not been analyzed.
  326. `regex`::
  327. (string) A regular expression that is used to search for values that match the
  328. category.
  329. `terms`::
  330. (string) A space separated list of the common tokens that are matched in
  331. values of the category.
  332. [float]
  333. [[ml-results-overall-buckets]]
  334. ==== Overall Buckets
  335. Overall buckets provide a summary of bucket results over multiple jobs.
  336. Their `bucket_span` equals the longest `bucket_span` of the jobs in question.
  337. The `overall_score` is the `top_n` average of the max `anomaly_score` per job
  338. within the overall bucket time interval.
  339. This means that you can fine-tune the `overall_score` so that it is more
  340. or less sensitive to the number of jobs that detect an anomaly at the same time.
  341. An overall bucket resource has the following properties:
  342. `timestamp`::
  343. (date) The start time of the overall bucket.
  344. `bucket_span`::
  345. (number) The length of the bucket in seconds. Matches the `bucket_span`
  346. of the job with the longest one.
  347. `overall_score`::
  348. (number) The `top_n` average of the max bucket `anomaly_score` per job.
  349. `jobs`::
  350. (array) An array of objects that contain the `max_anomaly_score` per `job_id`.
  351. `is_interim`::
  352. (boolean) If true, this is an interim result. In other words, the anomaly
  353. record is calculated based on partial input data.
  354. `result_type`::
  355. (string) Internal. This is always set to `overall_bucket`.