123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549 |
- [role="xpack"]
- [testenv="platinum"]
- [[ml-job-resource]]
- === Job resources
- A job resource has the following properties:
- `analysis_config`::
- (object) The analysis configuration, which specifies how to analyze the data.
- See <<ml-analysisconfig, analysis configuration objects>>.
- `analysis_limits`::
- (object) Defines approximate limits on the memory resource requirements for the job.
- See <<ml-apilimits,analysis limits>>.
- `background_persist_interval`::
- (time units) Advanced configuration option.
- The time between each periodic persistence of the model.
- The default value is a randomized value between 3 to 4 hours, which avoids
- all jobs persisting at exactly the same time. The smallest allowed value is
- 1 hour.
- +
- --
- TIP: For very large models (several GB), persistence could take 10-20 minutes,
- so do not set the `background_persist_interval` value too low.
- --
- `create_time`::
- (string) The time the job was created. For example, `1491007356077`. This
- property is informational; you cannot change its value.
- `custom_settings`::
- (object) Advanced configuration option. Contains custom meta data about the
- job. For example, it can contain custom URL information as shown in
- {xpack-ref}/ml-configuring-url.html[Adding Custom URLs to Machine Learning Results].
- `data_description`::
- (object) Describes the data format and how APIs parse timestamp fields.
- See <<ml-datadescription,data description objects>>.
- `description`::
- (string) An optional description of the job.
- `finished_time`::
- (string) If the job closed or failed, this is the time the job finished,
- otherwise it is `null`. This property is informational; you cannot change its
- value.
- `groups`::
- (array of strings) A list of job groups. A job can belong to no groups or
- many. For example, `["group1", "group2"]`.
- `job_id`::
- (string) The unique identifier for the job. This identifier can contain
- lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It
- must start and end with alphanumeric characters. This property is
- informational; you cannot change the identifier for existing jobs.
- `job_type`::
- (string) Reserved for future use, currently set to `anomaly_detector`.
- `job_version`::
- (string) The version of {es} that existed on the node when the job was created.
- `model_plot_config`::
- (object) Configuration properties for storing additional model information.
- See <<ml-apimodelplotconfig, model plot configuration>>.
- `model_snapshot_id`::
- (string) A numerical character string that uniquely identifies the model
- snapshot. For example, `1491007364`. This property is informational; you
- cannot change its value. For more information about model snapshots, see
- <<ml-snapshot-resource>>.
- `model_snapshot_retention_days`::
- (long) The time in days that model snapshots are retained for the job.
- Older snapshots are deleted. The default value is `1`, which means snapshots
- are retained for one day (twenty-four hours).
- `renormalization_window_days`::
- (long) Advanced configuration option.
- The period over which adjustments to the score are applied, as new data is seen.
- The default value is the longer of 30 days or 100 `bucket_spans`.
- `results_index_name`::
- (string) The name of the index in which to store the {ml} results.
- The default value is `shared`,
- which corresponds to the index name `.ml-anomalies-shared`
- `results_retention_days`::
- (long) Advanced configuration option.
- The number of days for which job results are retained.
- Once per day at 00:30 (server time), results older than this period are
- deleted from Elasticsearch. The default value is null, which means results
- are retained.
- [[ml-analysisconfig]]
- ==== Analysis Configuration Objects
- An analysis configuration object has the following properties:
- `bucket_span`::
- (time units) The size of the interval that the analysis is aggregated into,
- typically between `5m` and `1h`. The default value is `5m`. For more
- information about time units, see <<time-units,Common options>>.
- `categorization_field_name`::
- (string) If this property is specified, the values of the specified field will
- be categorized. The resulting categories must be used in a detector by setting
- `by_field_name`, `over_field_name`, or `partition_field_name` to the keyword
- `mlcategory`. For more information, see
- {xpack-ref}/ml-configuring-categories.html[Categorizing Log Messages].
- `categorization_filters`::
- (array of strings) If `categorization_field_name` is specified,
- you can also define optional filters. This property expects an array of
- regular expressions. The expressions are used to filter out matching sequences
- from the categorization field values. You can use this functionality to fine
- tune the categorization by excluding sequences from consideration when
- categories are defined. For example, you can exclude SQL statements that
- appear in your log files. For more information, see
- {xpack-ref}/ml-configuring-categories.html[Categorizing Log Messages].
- This property cannot be used at the same time as `categorization_analyzer`.
- If you only want to define simple regular expression filters that are applied
- prior to tokenization, setting this property is the easiest method.
- If you also want to customize the tokenizer or post-tokenization filtering,
- use the `categorization_analyzer` property instead and include the filters as
- `pattern_replace` character filters. The effect is exactly the same.
- `categorization_analyzer`::
- (object or string) If `categorization_field_name` is specified, you can also
- define the analyzer that is used to interpret the categorization field. This
- property cannot be used at the same time as `categorization_filters`. See
- <<ml-categorizationanalyzer,categorization analyzer>>.
- `detectors`::
- (array) An array of detector configuration objects,
- which describe the anomaly detectors that are used in the job.
- See <<ml-detectorconfig,detector configuration objects>>. +
- +
- --
- NOTE: If the `detectors` array does not contain at least one detector,
- no analysis can occur and an error is returned.
- --
- `influencers`::
- (array of strings) A comma separated list of influencer field names.
- Typically these can be the by, over, or partition fields that are used in the
- detector configuration. You might also want to use a field name that is not
- specifically named in a detector, but is available as part of the input data.
- When you use multiple detectors, the use of influencers is recommended as it
- aggregates results for each influencer entity.
- `latency`::
- (time units) The size of the window in which to expect data that is out of
- time order. The default value is 0 (no latency). If you specify a non-zero
- value, it must be greater than or equal to one second. For more information
- about time units, see <<time-units,Common options>>.
- +
- --
- NOTE: Latency is only applicable when you send data by using
- the <<ml-post-data,post data>> API.
- --
- `multivariate_by_fields`::
- (boolean) This functionality is reserved for internal use. It is not supported
- for use in customer environments and is not subject to the support SLA of
- official GA features.
- +
- --
- If set to `true`, the analysis will automatically find correlations
- between metrics for a given `by` field value and report anomalies when those
- correlations cease to hold. For example, suppose CPU and memory usage on host A
- is usually highly correlated with the same metrics on host B. Perhaps this
- correlation occurs because they are running a load-balanced application.
- If you enable this property, then anomalies will be reported when, for example,
- CPU usage on host A is high and the value of CPU usage on host B is low.
- That is to say, you'll see an anomaly when the CPU of host A is unusual given
- the CPU of host B.
- NOTE: To use the `multivariate_by_fields` property, you must also specify
- `by_field_name` in your detector.
- --
- `summary_count_field_name`::
- (string) If this property is specified, the data that is fed to the job is
- expected to be pre-summarized. This property value is the name of the field
- that contains the count of raw data points that have been summarized. The same
- `summary_count_field_name` applies to all detectors in the job.
- +
- --
- NOTE: The `summary_count_field_name` property cannot be used with the `metric`
- function.
- --
- After you create a job, you cannot change the analysis configuration object; all
- the properties are informational.
- [float]
- [[ml-detectorconfig]]
- ==== Detector Configuration Objects
- Detector configuration objects specify which data fields a job analyzes.
- They also specify which analytical functions are used.
- You can specify multiple detectors for a job.
- Each detector has the following properties:
- `by_field_name`::
- (string) The field used to split the data.
- In particular, this property is used for analyzing the splits with respect to their own history.
- It is used for finding unusual values in the context of the split.
- `detector_description`::
- (string) A description of the detector. For example, `Low event rate`.
- `detector_index`::
- (integer) A unique identifier for the detector. This identifier is based on
- the order of the detectors in the `analysis_config`, starting at zero. You can
- use this identifier when you want to update a specific detector.
- `exclude_frequent`::
- (string) Contains one of the following values: `all`, `none`, `by`, or `over`.
- If set, frequent entities are excluded from influencing the anomaly results.
- Entities can be considered frequent over time or frequent in a population.
- If you are working with both over and by fields, then you can set `exclude_frequent`
- to `all` for both fields, or to `by` or `over` for those specific fields.
- `field_name`::
- (string) The field that the detector uses in the function. If you use an event rate
- function such as `count` or `rare`, do not specify this field. +
- +
- --
- NOTE: The `field_name` cannot contain double quotes or backslashes.
- --
- `function`::
- (string) The analysis function that is used.
- For example, `count`, `rare`, `mean`, `min`, `max`, and `sum`. For more
- information, see {xpack-ref}/ml-functions.html[Function Reference].
- `over_field_name`::
- (string) The field used to split the data.
- In particular, this property is used for analyzing the splits with respect to
- the history of all splits. It is used for finding unusual values in the
- population of all splits. For more information, see
- {xpack-ref}/ml-configuring-pop.html[Performing Population Analysis].
- `partition_field_name`::
- (string) The field used to segment the analysis.
- When you use this property, you have completely independent baselines for each value of this field.
- `use_null`::
- (boolean) Defines whether a new series is used as the null series
- when there is no value for the by or partition fields. The default value is `false`.
- `custom_rules`::
- (array) An array of custom rule objects, which enable customizing how the detector works.
- For example, a rule may dictate to the detector conditions under which results should be skipped.
- For more information see <<ml-detector-custom-rule,detector custom rule objects>>. +
- +
- --
- IMPORTANT: Field names are case sensitive, for example a field named 'Bytes'
- is different from one named 'bytes'.
- --
- After you create a job, the only properties you can change in the detector
- configuration object are the `detector_description` and the `custom_rules`;
- all other properties are informational.
- [float]
- [[ml-datadescription]]
- ==== Data Description Objects
- The data description defines the format of the input data when you send data to
- the job by using the <<ml-post-data,post data>> API. Note that when configure
- a {dfeed}, these properties are automatically set.
- When data is received via the <<ml-post-data,post data>> API, it is not stored
- in {es}. Only the results for anomaly detection are retained.
- A data description object has the following properties:
- `format`::
- (string) Only `JSON` format is supported at this time.
- `time_field`::
- (string) The name of the field that contains the timestamp.
- The default value is `time`.
- `time_format`::
- (string) The time format, which can be `epoch`, `epoch_ms`, or a custom pattern.
- The default value is `epoch`, which refers to UNIX or Epoch time (the number of seconds
- since 1 Jan 1970).
- The value `epoch_ms` indicates that time is measured in milliseconds since the epoch.
- The `epoch` and `epoch_ms` time formats accept either integer or real values. +
- +
- --
- NOTE: Custom patterns must conform to the Java `DateTimeFormatter` class.
- When you use date-time formatting patterns, it is recommended that you provide
- the full date, time and time zone. For example: `yyyy-MM-dd'T'HH:mm:ssX`.
- If the pattern that you specify is not sufficient to produce a complete timestamp,
- job creation fails.
- --
- [float]
- [[ml-categorizationanalyzer]]
- ==== Categorization Analyzer
- The categorization analyzer specifies how the `categorization_field` is
- interpreted by the categorization process. The syntax is very similar to that
- used to define the `analyzer` in the <<indices-analyze,Analyze endpoint>>.
- The `categorization_analyzer` field can be specified either as a string or as
- an object.
- If it is a string it must refer to a <<analysis-analyzers,built-in analyzer>> or
- one added by another plugin.
- If it is an object it has the following properties:
- `char_filter`::
- (array of strings or objects) One or more
- <<analysis-charfilters,character filters>>. In addition to the built-in
- character filters, other plugins can provide more character filters. This
- property is optional. If it is not specified, no character filters are applied
- prior to categorization. If you are customizing some other aspect of the
- analyzer and you need to achieve the equivalent of `categorization_filters`
- (which are not permitted when some other aspect of the analyzer is customized),
- add them here as
- <<analysis-pattern-replace-charfilter,pattern replace character filters>>.
- `tokenizer`::
- (string or object) The name or definition of the
- <<analysis-tokenizers,tokenizer>> to use after character filters are applied.
- This property is compulsory if `categorization_analyzer` is specified as an
- object. Machine learning provides a tokenizer called `ml_classic` that
- tokenizes in the same way as the non-customizable tokenizer in older versions
- of the product. If you want to use that tokenizer but change the character or
- token filters, specify `"tokenizer": "ml_classic"` in your
- `categorization_analyzer`.
- `filter`::
- (array of strings or objects) One or more
- <<analysis-tokenfilters,token filters>>. In addition to the built-in token
- filters, other plugins can provide more token filters. This property is
- optional. If it is not specified, no token filters are applied prior to
- categorization.
- If you omit the `categorization_analyzer`, the following default values are used:
- [source,js]
- --------------------------------------------------
- POST _ml/anomaly_detectors/_validate
- {
- "analysis_config" : {
- "categorization_analyzer" : {
- "tokenizer" : "ml_classic",
- "filter" : [
- { "type" : "stop", "stopwords": [
- "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday",
- "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun",
- "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December",
- "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec",
- "GMT", "UTC"
- ] }
- ]
- },
- "categorization_field_name": "message",
- "detectors" :[{
- "function":"count",
- "by_field_name": "mlcategory"
- }]
- },
- "data_description" : {
- }
- }
- --------------------------------------------------
- // CONSOLE
- If you specify any part of the `categorization_analyzer`, however, any omitted
- sub-properties are _not_ set to default values.
- If you are categorizing non-English messages in a language where words are
- separated by spaces, you might get better results if you change the day or month
- words in the stop token filter to the appropriate words in your language. If you
- are categorizing messages in a language where words are not separated by spaces,
- you must use a different tokenizer as well in order to get sensible
- categorization results.
- It is important to be aware that analyzing for categorization of machine
- generated log messages is a little different from tokenizing for search.
- Features that work well for search, such as stemming, synonym substitution, and
- lowercasing are likely to make the results of categorization worse. However, in
- order for drill down from {ml} results to work correctly, the tokens that the
- categorization analyzer produces must be similar to those produced by the search
- analyzer. If they are sufficiently similar, when you search for the tokens that
- the categorization analyzer produces then you find the original document that
- the categorization field value came from.
- For more information, see
- {xpack-ref}/ml-configuring-categories.html[Categorizing Log Messages].
- [float]
- [[ml-detector-custom-rule]]
- ==== Detector Custom Rule
- {stack-ov}/ml-rules.html[Custom rules] enable you to customize the way detectors
- operate.
- A custom rule has the following properties:
- `actions`::
- (array) The set of actions to be triggered when the rule applies.
- If more than one action is specified the effects of all actions are combined.
- The available actions include: +
- `skip_result`::: The result will not be created. This is the default value.
- Unless you also specify `skip_model_update`, the model will be updated as
- usual with the corresponding series value.
- `skip_model_update`::: The value for that series will not be used to update
- the model. Unless you also specify `skip_result`, the results will be created
- as usual. This action is suitable when certain values are expected to be
- consistently anomalous and they affect the model in a way that negatively
- impacts the rest of the results.
-
- `scope`::
- (object) An optional scope of series where the rule applies. By default, the
- scope includes all series. Scoping is allowed for any of the fields that are
- also specified in `by_field_name`, `over_field_name`, or `partition_field_name`.
- To add a scope for a field, add the field name as a key in the scope object and
- set its value to an object with the following properties:
- `filter_id`:::
- (string) The id of the filter to be used.
- `filter_type`:::
- (string) Either `include` (the rule applies for values in the filter)
- or `exclude` (the rule applies for values not in the filter). Defaults
- to `include`.
- `conditions`::
- (array) An optional array of numeric conditions when the rule applies.
- Multiple conditions are combined together with a logical `AND`.
- +
- --
- NOTE: If your detector uses `lat_long`, `metric`, `rare`, or `freq_rare`
- functions, you can only specify `conditions` that apply to `time`.
-
- A condition has the following properties:
- `applies_to`:::
- (string) Specifies the result property to which the condition applies.
- The available options are `actual`, `typical`, `diff_from_typical`, `time`.
- `operator`:::
- (string) Specifies the condition operator. The available options are
- `gt` (greater than), `gte` (greater than or equals), `lt` (less than) and `lte` (less than or equals).
- `value`:::
- (double) The value that is compared against the `applies_to` field using the `operator`.
- --
- A rule is required to either have a non-empty scope or at least one condition.
- For more examples see
- {stack-ov}/ml-configuring-detector-custom-rules.html[Configuring Detector Custom Rules].
- [float]
- [[ml-apilimits]]
- ==== Analysis Limits
- Limits can be applied for the resources required to hold the mathematical models in memory.
- These limits are approximate and can be set per job. They do not control the
- memory used by other processes, for example the Elasticsearch Java processes.
- If necessary, you can increase the limits after the job is created.
- The `analysis_limits` object has the following properties:
- `categorization_examples_limit`::
- (long) The maximum number of examples stored per category in memory and
- in the results data store. The default value is 4. If you increase this value,
- more examples are available, however it requires that you have more storage available.
- If you set this value to `0`, no examples are stored. +
- +
- --
- NOTE: The `categorization_examples_limit` only applies to analysis that uses categorization.
- For more information, see
- {xpack-ref}/ml-configuring-categories.html[Categorizing Log Messages].
- --
- `model_memory_limit`::
- (long or string) The approximate maximum amount of memory resources that are
- required for analytical processing. Once this limit is approached, data pruning
- becomes more aggressive. Upon exceeding this limit, new entities are not
- modeled. The default value for jobs created in version 6.1 and later is `1024mb`.
- This value will need to be increased for jobs that are expected to analyze high
- cardinality fields, but the default is set to a relatively small size to ensure
- that high resource usage is a conscious decision. The default value for jobs
- created in versions earlier than 6.1 is `4096mb`.
- +
- --
- If you specify a number instead of a string, the units are assumed to be MiB.
- Specifying a string is recommended for clarity. If you specify a byte size unit
- of `b` or `kb` and the number does not equate to a discrete number of megabytes,
- it is rounded down to the closest MiB. The minimum valid value is 1 MiB. If you
- specify a value less than 1 MiB, an error occurs. For more information about
- supported byte size units, see <<byte-units,Common options>>.
- If your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
- setting, an error occurs when you try to create jobs that have
- `model_memory_limit` values greater than that setting. For more information,
- see <<ml-settings>>.
- --
- [float]
- [[ml-apimodelplotconfig]]
- ==== Model Plot Config
- This advanced configuration option stores model information along with the
- results. It provides a more detailed view into anomaly detection.
- WARNING: If you enable model plot it can add considerable overhead to the performance
- of the system; it is not feasible for jobs with many entities.
- Model plot provides a simplified and indicative view of the model and its bounds.
- It does not display complex features such as multivariate correlations or multimodal data.
- As such, anomalies may occasionally be reported which cannot be seen in the model plot.
- Model plot config can be configured when the job is created or updated later. It must be
- disabled if performance issues are experienced.
- The `model_plot_config` object has the following properties:
- `enabled`::
- (boolean) If true, enables calculation and storage of the model bounds for
- each entity that is being analyzed. By default, this is not enabled.
- `terms`::
- experimental[] (string) Limits data collection to this comma separated list of
- partition or by field values. If terms are not specified or it is an empty
- string, no filtering is applied. For example, "CPU,NetworkIn,DiskWrites".
- Wildcards are not supported. Only the specified `terms` can be viewed when
- using the Single Metric Viewer.
|