|
@@ -1,479 +0,0 @@
|
|
|
-[role="xpack"]
|
|
|
-[testenv="platinum"]
|
|
|
-[[ml-results-resource]]
|
|
|
-=== Results resources
|
|
|
-
|
|
|
-Several different result types are created for each job. You can query anomaly
|
|
|
-results for _buckets_, _influencers_, and _records_ by using the results API.
|
|
|
-Summarized bucket results over multiple jobs can be queried as well; those
|
|
|
-results are called _overall buckets_.
|
|
|
-
|
|
|
-Results are written for each `bucket_span`. The timestamp for the results is the
|
|
|
-start of the bucket time interval.
|
|
|
-
|
|
|
-The results include scores, which are calculated for each anomaly result type and
|
|
|
-each bucket interval. These scores are aggregated in order to reduce noise, and
|
|
|
-normalized in order to identify and rank the most mathematically significant
|
|
|
-anomalies.
|
|
|
-
|
|
|
-Bucket results provide the top level, overall view of the job and are ideal for
|
|
|
-alerts. For example, the bucket results might indicate that at 16:05 the system
|
|
|
-was unusual. This information is a summary of all the anomalies, pinpointing
|
|
|
-when they occurred.
|
|
|
-
|
|
|
-Influencer results show which entities were anomalous and when. For example,
|
|
|
-the influencer results might indicate that at 16:05 `user_name: Bob` was unusual.
|
|
|
-This information is a summary of all the anomalies for each entity, so there
|
|
|
-can be a lot of these results. Once you have identified a notable bucket time,
|
|
|
-you can look to see which entities were significant.
|
|
|
-
|
|
|
-Record results provide details about what the individual anomaly was, when it
|
|
|
-occurred and which entity was involved. For example, the record results might
|
|
|
-indicate that at 16:05 Bob sent 837262434 bytes, when the typical value was
|
|
|
-1067 bytes. Once you have identified a bucket time and perhaps a significant
|
|
|
-entity too, you can drill through to the record results in order to investigate
|
|
|
-the anomalous behavior.
|
|
|
-
|
|
|
-Categorization results contain the definitions of _categories_ that have been
|
|
|
-identified. These are only applicable for jobs that are configured to analyze
|
|
|
-unstructured log data using categorization. These results do not contain a
|
|
|
-timestamp or any calculated scores. For more information, see
|
|
|
-{stack-ov}/ml-configuring-categories.html[Categorizing log messages].
|
|
|
-
|
|
|
-* <<ml-results-buckets,Buckets>>
|
|
|
-* <<ml-results-influencers,Influencers>>
|
|
|
-* <<ml-results-records,Records>>
|
|
|
-* <<ml-results-categories,Categories>>
|
|
|
-* <<ml-results-overall-buckets,Overall Buckets>>
|
|
|
-
|
|
|
-NOTE: All of these resources and properties are informational; you cannot
|
|
|
-change their values.
|
|
|
-
|
|
|
-[float]
|
|
|
-[[ml-results-buckets]]
|
|
|
-==== Buckets
|
|
|
-
|
|
|
-Bucket results provide the top level, overall view of the job and are best for
|
|
|
-alerting.
|
|
|
-
|
|
|
-Each bucket has an `anomaly_score`, which is a statistically aggregated and
|
|
|
-normalized view of the combined anomalousness of all the record results within
|
|
|
-each bucket.
|
|
|
-
|
|
|
-One bucket result is written for each `bucket_span` for each job, even if it is
|
|
|
-not considered to be anomalous. If the bucket is not anomalous, it has an
|
|
|
-`anomaly_score` of zero.
|
|
|
-
|
|
|
-When you identify an anomalous bucket, you can investigate further by expanding
|
|
|
-the bucket resource to show the records as nested objects. Alternatively, you
|
|
|
-can access the records resource directly and filter by the date range.
|
|
|
-
|
|
|
-A bucket resource has the following properties:
|
|
|
-
|
|
|
-`anomaly_score`::
|
|
|
- (number) The maximum anomaly score, between 0-100, for any of the bucket
|
|
|
- influencers. This is an overall, rate-limited score for the job. All the
|
|
|
- anomaly records in the bucket contribute to this score. This value might be
|
|
|
- updated as new data is analyzed.
|
|
|
-
|
|
|
-`bucket_influencers`::
|
|
|
- (array) An array of bucket influencer objects.
|
|
|
- For more information, see <<ml-results-bucket-influencers,Bucket Influencers>>.
|
|
|
-
|
|
|
-`bucket_span`::
|
|
|
- (number) The length of the bucket in seconds.
|
|
|
- This value matches the `bucket_span` that is specified in the job.
|
|
|
-
|
|
|
-`event_count`::
|
|
|
- (number) The number of input data records processed in this bucket.
|
|
|
-
|
|
|
-`initial_anomaly_score`::
|
|
|
- (number) The maximum `anomaly_score` for any of the bucket influencers.
|
|
|
- This is the initial value that was calculated at the time the bucket was
|
|
|
- processed.
|
|
|
-
|
|
|
-`is_interim`::
|
|
|
- (boolean) If true, this is an interim result. In other words, the bucket
|
|
|
- results are calculated based on partial input data.
|
|
|
-
|
|
|
-`job_id`::
|
|
|
- (string) The unique identifier for the job that these results belong to.
|
|
|
-
|
|
|
-`processing_time_ms`::
|
|
|
- (number) The amount of time, in milliseconds, that it took to analyze the
|
|
|
- bucket contents and calculate results.
|
|
|
-
|
|
|
-`result_type`::
|
|
|
- (string) Internal. This value is always set to `bucket`.
|
|
|
-
|
|
|
-`timestamp`::
|
|
|
- (date) The start time of the bucket. This timestamp uniquely identifies the
|
|
|
- bucket. +
|
|
|
-
|
|
|
-NOTE: Events that occur exactly at the timestamp of the bucket are included in
|
|
|
-the results for the bucket.
|
|
|
-
|
|
|
-
|
|
|
-[float]
|
|
|
-[[ml-results-bucket-influencers]]
|
|
|
-==== Bucket Influencers
|
|
|
-
|
|
|
-Bucket influencer results are available as nested objects contained within
|
|
|
-bucket results. These results are an aggregation for each type of influencer.
|
|
|
-For example, if both `client_ip` and `user_name` were specified as influencers,
|
|
|
-then you would be able to determine when the `client_ip` or `user_name` values
|
|
|
-were collectively anomalous.
|
|
|
-
|
|
|
-There is a built-in bucket influencer called `bucket_time` which is always
|
|
|
-available. This bucket influencer is the aggregation of all records in the
|
|
|
-bucket; it is not just limited to a type of influencer.
|
|
|
-
|
|
|
-NOTE: A bucket influencer is a type of influencer. For example, `client_ip` or
|
|
|
-`user_name` can be bucket influencers, whereas `192.168.88.2` and `Bob` are
|
|
|
-influencers.
|
|
|
-
|
|
|
-An bucket influencer object has the following properties:
|
|
|
-
|
|
|
-`anomaly_score`::
|
|
|
- (number) A normalized score between 0-100, which is calculated for each bucket
|
|
|
- influencer. This score might be updated as newer data is analyzed.
|
|
|
-
|
|
|
-`bucket_span`::
|
|
|
- (number) The length of the bucket in seconds. This value matches the `bucket_span`
|
|
|
- that is specified in the job.
|
|
|
-
|
|
|
-`initial_anomaly_score`::
|
|
|
- (number) The score between 0-100 for each bucket influencer. This score is
|
|
|
- the initial value that was calculated at the time the bucket was processed.
|
|
|
-
|
|
|
-`influencer_field_name`::
|
|
|
- (string) The field name of the influencer. For example `client_ip` or
|
|
|
- `user_name`.
|
|
|
-
|
|
|
-`influencer_field_value`::
|
|
|
- (string) The field value of the influencer. For example `192.168.88.2` or
|
|
|
- `Bob`.
|
|
|
-
|
|
|
-`is_interim`::
|
|
|
- (boolean) If true, this is an interim result. In other words, the bucket
|
|
|
- influencer results are calculated based on partial input data.
|
|
|
-
|
|
|
-`job_id`::
|
|
|
- (string) The unique identifier for the job that these results belong to.
|
|
|
-
|
|
|
-`probability`::
|
|
|
- (number) The probability that the bucket has this behavior, in the range 0
|
|
|
- to 1. For example, 0.0000109783. This value can be held to a high precision
|
|
|
- of over 300 decimal places, so the `anomaly_score` is provided as a
|
|
|
- human-readable and friendly interpretation of this.
|
|
|
-
|
|
|
-`raw_anomaly_score`::
|
|
|
- (number) Internal.
|
|
|
-
|
|
|
-`result_type`::
|
|
|
- (string) Internal. This value is always set to `bucket_influencer`.
|
|
|
-
|
|
|
-`timestamp`::
|
|
|
- (date) The start time of the bucket for which these results were calculated.
|
|
|
-
|
|
|
-[float]
|
|
|
-[[ml-results-influencers]]
|
|
|
-==== Influencers
|
|
|
-
|
|
|
-Influencers are the entities that have contributed to, or are to blame for,
|
|
|
-the anomalies. Influencer results are available only if an
|
|
|
-`influencer_field_name` is specified in the job configuration.
|
|
|
-
|
|
|
-Influencers are given an `influencer_score`, which is calculated based on the
|
|
|
-anomalies that have occurred in each bucket interval. For jobs with more than
|
|
|
-one detector, this gives a powerful view of the most anomalous entities.
|
|
|
-
|
|
|
-For example, if you are analyzing unusual bytes sent and unusual domains
|
|
|
-visited and you specified `user_name` as the influencer, then an
|
|
|
-`influencer_score` for each anomalous user name is written per bucket. For
|
|
|
-example, if `user_name: Bob` had an `influencer_score` greater than 75, then
|
|
|
-`Bob` would be considered very anomalous during this time interval in one or
|
|
|
-both of those areas (unusual bytes sent or unusual domains visited).
|
|
|
-
|
|
|
-One influencer result is written per bucket for each influencer that is
|
|
|
-considered anomalous.
|
|
|
-
|
|
|
-When you identify an influencer with a high score, you can investigate further
|
|
|
-by accessing the records resource for that bucket and enumerating the anomaly
|
|
|
-records that contain the influencer.
|
|
|
-
|
|
|
-An influencer object has the following properties:
|
|
|
-
|
|
|
-`bucket_span`::
|
|
|
- (number) The length of the bucket in seconds. This value matches the `bucket_span`
|
|
|
- that is specified in the job.
|
|
|
-
|
|
|
-`influencer_score`::
|
|
|
- (number) A normalized score between 0-100, which is based on the probability
|
|
|
- of the influencer in this bucket aggregated across detectors. Unlike
|
|
|
- `initial_influencer_score`, this value will be updated by a re-normalization
|
|
|
- process as new data is analyzed.
|
|
|
-
|
|
|
-`initial_influencer_score`::
|
|
|
- (number) A normalized score between 0-100, which is based on the probability
|
|
|
- of the influencer aggregated across detectors. This is the initial value that
|
|
|
- was calculated at the time the bucket was processed.
|
|
|
-
|
|
|
-`influencer_field_name`::
|
|
|
- (string) The field name of the influencer.
|
|
|
-
|
|
|
-`influencer_field_value`::
|
|
|
- (string) The entity that influenced, contributed to, or was to blame for the
|
|
|
- anomaly.
|
|
|
-
|
|
|
-`is_interim`::
|
|
|
- (boolean) If true, this is an interim result. In other words, the influencer
|
|
|
- results are calculated based on partial input data.
|
|
|
-
|
|
|
-`job_id`::
|
|
|
- (string) The unique identifier for the job that these results belong to.
|
|
|
-
|
|
|
-`probability`::
|
|
|
- (number) The probability that the influencer has this behavior, in the range
|
|
|
- 0 to 1. For example, 0.0000109783. This value can be held to a high precision
|
|
|
- of over 300 decimal places, so the `influencer_score` is provided as a
|
|
|
- human-readable and friendly interpretation of this.
|
|
|
-// For example, 0.03 means 3%. This value is held to a high precision of over
|
|
|
-//300 decimal places. In scientific notation, a value of 3.24E-300 is highly
|
|
|
-//unlikely and therefore highly anomalous.
|
|
|
-
|
|
|
-`result_type`::
|
|
|
- (string) Internal. This value is always set to `influencer`.
|
|
|
-
|
|
|
-`timestamp`::
|
|
|
- (date) The start time of the bucket for which these results were calculated.
|
|
|
-
|
|
|
-NOTE: Additional influencer properties are added, depending on the fields being
|
|
|
-analyzed. For example, if it's analyzing `user_name` as an influencer, then a
|
|
|
-field `user_name` is added to the result document. This information enables you to
|
|
|
-filter the anomaly results more easily.
|
|
|
-
|
|
|
-
|
|
|
-[float]
|
|
|
-[[ml-results-records]]
|
|
|
-==== Records
|
|
|
-
|
|
|
-Records contain the detailed analytical results. They describe the anomalous
|
|
|
-activity that has been identified in the input data based on the detector
|
|
|
-configuration.
|
|
|
-
|
|
|
-For example, if you are looking for unusually large data transfers, an anomaly
|
|
|
-record can identify the source IP address, the destination, the time window
|
|
|
-during which it occurred, the expected and actual size of the transfer, and the
|
|
|
-probability of this occurrence.
|
|
|
-
|
|
|
-There can be many anomaly records depending on the characteristics and size of
|
|
|
-the input data. In practice, there are often too many to be able to manually
|
|
|
-process them. The {ml-features} therefore perform a sophisticated
|
|
|
-aggregation of the anomaly records into buckets.
|
|
|
-
|
|
|
-The number of record results depends on the number of anomalies found in each
|
|
|
-bucket, which relates to the number of time series being modeled and the number of
|
|
|
-detectors.
|
|
|
-
|
|
|
-A record object has the following properties:
|
|
|
-
|
|
|
-`actual`::
|
|
|
- (array) The actual value for the bucket.
|
|
|
-
|
|
|
-`bucket_span`::
|
|
|
- (number) The length of the bucket in seconds.
|
|
|
- This value matches the `bucket_span` that is specified in the job.
|
|
|
-
|
|
|
-`by_field_name`::
|
|
|
- (string) The name of the analyzed field. This value is present only if
|
|
|
- it is specified in the detector. For example, `client_ip`.
|
|
|
-
|
|
|
-`by_field_value`::
|
|
|
- (string) The value of `by_field_name`. This value is present only if
|
|
|
- it is specified in the detector. For example, `192.168.66.2`.
|
|
|
-
|
|
|
-`causes`::
|
|
|
- (array) For population analysis, an over field must be specified in the
|
|
|
- detector. This property contains an array of anomaly records that are the
|
|
|
- causes for the anomaly that has been identified for the over field. If no
|
|
|
- over fields exist, this field is not present. This sub-resource contains
|
|
|
- the most anomalous records for the `over_field_name`. For scalability reasons,
|
|
|
- a maximum of the 10 most significant causes of the anomaly are returned. As
|
|
|
- part of the core analytical modeling, these low-level anomaly records are
|
|
|
- aggregated for their parent over field record. The causes resource contains
|
|
|
- similar elements to the record resource, namely `actual`, `typical`,
|
|
|
- `geo_results.actual_point`, `geo_results.typical_point`,
|
|
|
- `*_field_name` and `*_field_value`.
|
|
|
- Probability and scores are not applicable to causes.
|
|
|
-
|
|
|
-`detector_index`::
|
|
|
- (number) A unique identifier for the detector.
|
|
|
-
|
|
|
-`field_name`::
|
|
|
- (string) Certain functions require a field to operate on, for example, `sum()`.
|
|
|
- For those functions, this value is the name of the field to be analyzed.
|
|
|
-
|
|
|
-`function`::
|
|
|
- (string) The function in which the anomaly occurs, as specified in the
|
|
|
- detector configuration. For example, `max`.
|
|
|
-
|
|
|
-`function_description`::
|
|
|
- (string) The description of the function in which the anomaly occurs, as
|
|
|
- specified in the detector configuration.
|
|
|
-
|
|
|
-`influencers`::
|
|
|
- (array) If `influencers` was specified in the detector configuration, then
|
|
|
- this array contains influencers that contributed to or were to blame for an
|
|
|
- anomaly.
|
|
|
-
|
|
|
-`initial_record_score`::
|
|
|
- (number) A normalized score between 0-100, which is based on the
|
|
|
- probability of the anomalousness of this record. This is the initial value
|
|
|
- that was calculated at the time the bucket was processed.
|
|
|
-
|
|
|
-`is_interim`::
|
|
|
- (boolean) If true, this is an interim result. In other words, the anomaly
|
|
|
- record is calculated based on partial input data.
|
|
|
-
|
|
|
-`job_id`::
|
|
|
- (string) The unique identifier for the job that these results belong to.
|
|
|
-
|
|
|
-`over_field_name`::
|
|
|
- (string) The name of the over field that was used in the analysis. This value
|
|
|
- is present only if it was specified in the detector. Over fields are used
|
|
|
- in population analysis. For example, `user`.
|
|
|
-
|
|
|
-`over_field_value`::
|
|
|
- (string) The value of `over_field_name`. This value is present only if it
|
|
|
- was specified in the detector. For example, `Bob`.
|
|
|
-
|
|
|
-`partition_field_name`::
|
|
|
- (string) The name of the partition field that was used in the analysis. This
|
|
|
- value is present only if it was specified in the detector. For example,
|
|
|
- `region`.
|
|
|
-
|
|
|
-`partition_field_value`::
|
|
|
- (string) The value of `partition_field_name`. This value is present only if
|
|
|
- it was specified in the detector. For example, `us-east-1`.
|
|
|
-
|
|
|
-`probability`::
|
|
|
- (number) The probability of the individual anomaly occurring, in the range
|
|
|
- 0 to 1. For example, 0.0000772031. This value can be held to a high precision
|
|
|
- of over 300 decimal places, so the `record_score` is provided as a
|
|
|
- human-readable and friendly interpretation of this.
|
|
|
-//In scientific notation, a value of 3.24E-300 is highly unlikely and therefore
|
|
|
-//highly anomalous.
|
|
|
-
|
|
|
-`multi_bucket_impact`::
|
|
|
- (number) an indication of how strongly an anomaly is multi bucket or single bucket.
|
|
|
- The value is on a scale of -5 to +5 where -5 means the anomaly is purely single
|
|
|
- bucket and +5 means the anomaly is purely multi bucket.
|
|
|
-
|
|
|
-`record_score`::
|
|
|
- (number) A normalized score between 0-100, which is based on the probability
|
|
|
- of the anomalousness of this record. Unlike `initial_record_score`, this
|
|
|
- value will be updated by a re-normalization process as new data is analyzed.
|
|
|
-
|
|
|
-`result_type`::
|
|
|
- (string) Internal. This is always set to `record`.
|
|
|
-
|
|
|
-`timestamp`::
|
|
|
- (date) The start time of the bucket for which these results were calculated.
|
|
|
-
|
|
|
-`typical`::
|
|
|
- (array) The typical value for the bucket, according to analytical modeling.
|
|
|
-
|
|
|
-`geo_results.actual_point`::
|
|
|
- (string) The actual value for the bucket formatted as a `geo_point`.
|
|
|
- If the detector function is `lat_long`, this is a comma delimited string
|
|
|
- of the latitude and longitude.
|
|
|
-
|
|
|
-`geo_results.typical_point`::
|
|
|
- (string) The typical value for the bucket formatted as a `geo_point`.
|
|
|
- If the detector function is `lat_long`, this is a comma delimited string
|
|
|
- of the latitude and longitude.
|
|
|
-
|
|
|
-NOTE: Additional record properties are added, depending on the fields being
|
|
|
-analyzed. For example, if it's analyzing `hostname` as a _by field_, then a field
|
|
|
-`hostname` is added to the result document. This information enables you to
|
|
|
-filter the anomaly results more easily.
|
|
|
-
|
|
|
-
|
|
|
-[float]
|
|
|
-[[ml-results-categories]]
|
|
|
-==== Categories
|
|
|
-
|
|
|
-When `categorization_field_name` is specified in the job configuration, it is
|
|
|
-possible to view the definitions of the resulting categories. A category
|
|
|
-definition describes the common terms matched and contains examples of matched
|
|
|
-values.
|
|
|
-
|
|
|
-The anomaly results from a categorization analysis are available as bucket,
|
|
|
-influencer, and record results. For example, the results might indicate that
|
|
|
-at 16:45 there was an unusual count of log message category 11. You can then
|
|
|
-examine the description and examples of that category.
|
|
|
-
|
|
|
-A category resource has the following properties:
|
|
|
-
|
|
|
-`category_id`::
|
|
|
- (unsigned integer) A unique identifier for the category.
|
|
|
-
|
|
|
-`examples`::
|
|
|
- (array) A list of examples of actual values that matched the category.
|
|
|
-
|
|
|
-`grok_pattern`::
|
|
|
- experimental[] (string) A Grok pattern that could be used in Logstash or an
|
|
|
- Ingest Pipeline to extract fields from messages that match the category. This
|
|
|
- field is experimental and may be changed or removed in a future release. The
|
|
|
- Grok patterns that are found are not optimal, but are often a good starting
|
|
|
- point for manual tweaking.
|
|
|
-
|
|
|
-`job_id`::
|
|
|
- (string) The unique identifier for the job that these results belong to.
|
|
|
-
|
|
|
-`max_matching_length`::
|
|
|
- (unsigned integer) The maximum length of the fields that matched the category.
|
|
|
- The value is increased by 10% to enable matching for similar fields that have
|
|
|
- not been analyzed.
|
|
|
-
|
|
|
-`regex`::
|
|
|
- (string) A regular expression that is used to search for values that match the
|
|
|
- category.
|
|
|
-
|
|
|
-`terms`::
|
|
|
- (string) A space separated list of the common tokens that are matched in
|
|
|
- values of the category.
|
|
|
-
|
|
|
-[float]
|
|
|
-[[ml-results-overall-buckets]]
|
|
|
-==== Overall Buckets
|
|
|
-
|
|
|
-Overall buckets provide a summary of bucket results over multiple jobs.
|
|
|
-Their `bucket_span` equals the longest `bucket_span` of the jobs in question.
|
|
|
-The `overall_score` is the `top_n` average of the max `anomaly_score` per job
|
|
|
-within the overall bucket time interval.
|
|
|
-This means that you can fine-tune the `overall_score` so that it is more
|
|
|
-or less sensitive to the number of jobs that detect an anomaly at the same time.
|
|
|
-
|
|
|
-An overall bucket resource has the following properties:
|
|
|
-
|
|
|
-`timestamp`::
|
|
|
- (date) The start time of the overall bucket.
|
|
|
-
|
|
|
-`bucket_span`::
|
|
|
- (number) The length of the bucket in seconds. Matches the `bucket_span`
|
|
|
- of the job with the longest one.
|
|
|
-
|
|
|
-`overall_score`::
|
|
|
- (number) The `top_n` average of the max bucket `anomaly_score` per job.
|
|
|
-
|
|
|
-`jobs`::
|
|
|
- (array) An array of objects that contain the `max_anomaly_score` per `job_id`.
|
|
|
-
|
|
|
-`is_interim`::
|
|
|
- (boolean) If true, this is an interim result. In other words, the anomaly
|
|
|
- record is calculated based on partial input data.
|
|
|
-
|
|
|
-`result_type`::
|
|
|
- (string) Internal. This is always set to `overall_bucket`.
|