123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111111121113111411151116111711181119112011211122112311241125112611271128 |
- tag::aggregations[]
- If set, the {dfeed} performs aggregation searches. Support for aggregations is
- limited and should only be used with low cardinality data. For more information,
- see
- {stack-ov}/ml-configuring-aggregation.html[Aggregating data for faster performance].
- end::aggregations[]
- tag::allow-lazy-open[]
- Advanced configuration option. Specifies whether this job can open when there is
- insufficient {ml} node capacity for it to be immediately assigned to a node. The
- default value is `false`; if a {ml} node with capacity to run the job cannot
- immediately be found, the <<ml-open-job,open {anomaly-jobs} API>> returns an
- error. However, this is also subject to the cluster-wide
- `xpack.ml.max_lazy_ml_nodes` setting; see <<advanced-ml-settings>>. If this
- option is set to `true`, the <<ml-open-job,open {anomaly-jobs} API>> does not
- return an error and the job waits in the `opening` state until sufficient {ml}
- node capacity is available.
- end::allow-lazy-open[]
- tag::allow-lazy-start[]
- Whether this job should be allowed to start when there is insufficient {ml} node
- capacity for it to be immediately assigned to a node. The default is `false`,
- which means that the <<start-dfanalytics>> will return an error if a {ml} node
- with capacity to run the job cannot immediately be found. (However, this is also
- subject to the cluster-wide `xpack.ml.max_lazy_ml_nodes` setting - see
- <<advanced-ml-settings>>.) If this option is set to `true` then the
- <<start-dfanalytics>> will not return an error, and the job will wait in the
- `starting` state until sufficient {ml} node capacity is available.
- end::allow-lazy-start[]
- tag::allow-no-datafeeds[]
- Specifies what to do when the request:
- +
- --
- * Contains wildcard expressions and there are no {dfeeds} that match.
- * Contains the `_all` string or no identifiers and there are no matches.
- * Contains wildcard expressions and there are only partial matches.
- The default value is `true`, which returns an empty `datafeeds` array when
- there are no matches and the subset of results when there are partial matches.
- If this parameter is `false`, the request returns a `404` status code when there
- are no matches or only partial matches.
- --
- end::allow-no-datafeeds[]
- tag::allow-no-jobs[]
- Specifies what to do when the request:
- +
- --
- * Contains wildcard expressions and there are no jobs that match.
- * Contains the `_all` string or no identifiers and there are no matches.
- * Contains wildcard expressions and there are only partial matches.
- The default value is `true`, which returns an empty `jobs` array
- when there are no matches and the subset of results when there are partial
- matches. If this parameter is `false`, the request returns a `404` status code
- when there are no matches or only partial matches.
- --
- end::allow-no-jobs[]
- tag::allow-no-match[]
- Specifies what to do when the request:
- +
- --
- * Contains wildcard expressions and there are no {dfanalytics-jobs} that match.
- * Contains the `_all` string or no identifiers and there are no matches.
- * Contains wildcard expressions and there are only partial matches.
- The default value is `true`, which returns an empty `data_frame_analytics` array
- when there are no matches and the subset of results when there are partial
- matches. If this parameter is `false`, the request returns a `404` status code
- when there are no matches or only partial matches.
- --
- end::allow-no-match[]
- tag::analysis[]
- Defines the type of {dfanalytics} you want to perform on your source index. For
- example: `outlier_detection`. See <<ml-dfa-analysis-objects>>.
- end::analysis[]
- tag::analysis-config[]
- The analysis configuration, which specifies how to analyze the data.
- After you create a job, you cannot change the analysis configuration; all
- the properties are informational. An analysis configuration object has the
- following properties:
- `bucket_span`:::
- (<<time-units,time units>>)
- include::{docdir}/ml/ml-shared.asciidoc[tag=bucket-span]
- `categorization_field_name`:::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=categorization-field-name]
- `categorization_filters`:::
- (array of strings)
- include::{docdir}/ml/ml-shared.asciidoc[tag=categorization-filters]
- `categorization_analyzer`:::
- (object or string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=categorization-analyzer]
- `detectors`:::
- (array) An array of detector configuration objects. Detector configuration
- objects specify which data fields a job analyzes. They also specify which
- analytical functions are used. You can specify multiple detectors for a job.
- include::{docdir}/ml/ml-shared.asciidoc[tag=detector]
- +
- --
- NOTE: If the `detectors` array does not contain at least one detector,
- no analysis can occur and an error is returned.
- --
- `influencers`:::
- (array of strings)
- include::{docdir}/ml/ml-shared.asciidoc[tag=influencers]
- `latency`:::
- (time units)
- include::{docdir}/ml/ml-shared.asciidoc[tag=latency]
- `multivariate_by_fields`:::
- (boolean)
- include::{docdir}/ml/ml-shared.asciidoc[tag=multivariate-by-fields]
- `summary_count_field_name`:::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=summary-count-field-name]
- end::analysis-config[]
- tag::analysis-limits[]
- Limits can be applied for the resources required to hold the mathematical models
- in memory. These limits are approximate and can be set per job. They do not
- control the memory used by other processes, for example the {es} Java
- processes. If necessary, you can increase the limits after the job is created.
- The `analysis_limits` object has the following properties:
- `categorization_examples_limit`:::
- (long)
- include::{docdir}/ml/ml-shared.asciidoc[tag=categorization-examples-limit]
- `model_memory_limit`:::
- (long or string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=model-memory-limit]
- end::analysis-limits[]
- tag::analyzed-fields[]
- Specify `includes` and/or `excludes` patterns to select which fields will be
- included in the analysis. If `analyzed_fields` is not set, only the relevant
- fields will be included. For example, all the numeric fields for {oldetection}.
- For the supported field types, see <<ml-put-dfanalytics-supported-fields>>. Also
- see the <<explain-dfanalytics>> which helps understand field selection.
- `includes`:::
- (Optional, array) An array of strings that defines the fields that will be
- included in the analysis.
-
- `excludes`:::
- (Optional, array) An array of strings that defines the fields that will be
- excluded from the analysis. You do not need to add fields with unsupported
- data types to `excludes`, these fields are excluded from the analysis
- automatically.
- end::analyzed-fields[]
- tag::background-persist-interval[]
- Advanced configuration option. The time between each periodic persistence of the
- model. The default value is a randomized value between 3 to 4 hours, which
- avoids all jobs persisting at exactly the same time. The smallest allowed value
- is 1 hour.
- +
- --
- TIP: For very large models (several GB), persistence could take 10-20 minutes,
- so do not set the `background_persist_interval` value too low.
- --
- end::background-persist-interval[]
- tag::bucket-span[]
- The size of the interval that the analysis is aggregated into, typically between
- `5m` and `1h`. The default value is `5m`. For more information about time units,
- see <<time-units>>.
- end::bucket-span[]
- tag::by-field-name[]
- The field used to split the data. In particular, this property is used for
- analyzing the splits with respect to their own history. It is used for finding
- unusual values in the context of the split.
- end::by-field-name[]
- tag::categorization-analyzer[]
- If `categorization_field_name` is specified, you can also define the analyzer
- that is used to interpret the categorization field. This property cannot be used
- at the same time as `categorization_filters`. The categorization analyzer
- specifies how the `categorization_field` is interpreted by the categorization
- process. The syntax is very similar to that used to define the `analyzer` in the
- <<indices-analyze,Analyze endpoint>>. For more information, see
- {stack-ov}/ml-configuring-categories.html[Categorizing log messages].
- +
- --
- The `categorization_analyzer` field can be specified either as a string or as an
- object. If it is a string it must refer to a
- <<analysis-analyzers,built-in analyzer>> or one added by another plugin. If it
- is an object it has the following properties:
- --
- `char_filter`::::
- (array of strings or objects)
- include::{docdir}/ml/ml-shared.asciidoc[tag=char-filter]
- `tokenizer`::::
- (string or object)
- include::{docdir}/ml/ml-shared.asciidoc[tag=tokenizer]
- `filter`::::
- (array of strings or objects)
- include::{docdir}/ml/ml-shared.asciidoc[tag=filter]
- end::categorization-analyzer[]
- tag::categorization-examples-limit[]
- The maximum number of examples stored per category in memory and in the results
- data store. The default value is 4. If you increase this value, more examples
- are available, however it requires that you have more storage available. If you
- set this value to `0`, no examples are stored.
- +
- --
- NOTE: The `categorization_examples_limit` only applies to analysis that uses
- categorization. For more information, see
- {stack-ov}/ml-configuring-categories.html[Categorizing log messages].
- --
- end::categorization-examples-limit[]
- tag::categorization-field-name[]
- If this property is specified, the values of the specified field will be
- categorized. The resulting categories must be used in a detector by setting
- `by_field_name`, `over_field_name`, or `partition_field_name` to the keyword
- `mlcategory`. For more information, see
- {stack-ov}/ml-configuring-categories.html[Categorizing log messages].
- end::categorization-field-name[]
- tag::categorization-filters[]
- If `categorization_field_name` is specified, you can also define optional
- filters. This property expects an array of regular expressions. The expressions
- are used to filter out matching sequences from the categorization field values.
- You can use this functionality to fine tune the categorization by excluding
- sequences from consideration when categories are defined. For example, you can
- exclude SQL statements that appear in your log files. For more information, see
- {stack-ov}/ml-configuring-categories.html[Categorizing log messages]. This
- property cannot be used at the same time as `categorization_analyzer`. If you
- only want to define simple regular expression filters that are applied prior to
- tokenization, setting this property is the easiest method. If you also want to
- customize the tokenizer or post-tokenization filtering, use the
- `categorization_analyzer` property instead and include the filters as
- `pattern_replace` character filters. The effect is exactly the same.
- end::categorization-filters[]
- tag::char-filter[]
- One or more <<analysis-charfilters,character filters>>. In addition to the
- built-in character filters, other plugins can provide more character filters.
- This property is optional. If it is not specified, no character filters are
- applied prior to categorization. If you are customizing some other aspect of the
- analyzer and you need to achieve the equivalent of `categorization_filters`
- (which are not permitted when some other aspect of the analyzer is customized),
- add them here as
- <<analysis-pattern-replace-charfilter,pattern replace character filters>>.
- end::char-filter[]
- tag::compute-feature-influence[]
- If `true`, the feature influence calculation is enabled. Defaults to `true`.
- end::compute-feature-influence[]
- tag::chunking-config[]
- {dfeeds-cap} might be required to search over long time periods, for several months
- or years. This search is split into time chunks in order to ensure the load
- on {es} is managed. Chunking configuration controls how the size of these time
- chunks are calculated and is an advanced configuration option.
- A chunking configuration object has the following properties:
- `mode`:::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=mode]
- `time_span`:::
- (<<time-units,time units>>)
- include::{docdir}/ml/ml-shared.asciidoc[tag=time-span]
- end::chunking-config[]
- tag::custom-rules[]
- An array of custom rule objects, which enable you to customize the way detectors
- operate. For example, a rule may dictate to the detector conditions under which
- results should be skipped. For more examples, see
- {stack-ov}/ml-configuring-detector-custom-rules.html[Configuring detector custom rules].
- A custom rule has the following properties:
- +
- --
- `actions`::
- (array) The set of actions to be triggered when the rule applies. If
- more than one action is specified the effects of all actions are combined. The
- available actions include:
- * `skip_result`: The result will not be created. This is the default value.
- Unless you also specify `skip_model_update`, the model will be updated as usual
- with the corresponding series value.
- * `skip_model_update`: The value for that series will not be used to update the
- model. Unless you also specify `skip_result`, the results will be created as
- usual. This action is suitable when certain values are expected to be
- consistently anomalous and they affect the model in a way that negatively
- impacts the rest of the results.
- `scope`::
- (object) An optional scope of series where the rule applies. A rule must either
- have a non-empty scope or at least one condition. By default, the scope includes
- all series. Scoping is allowed for any of the fields that are also specified in
- `by_field_name`, `over_field_name`, or `partition_field_name`. To add a scope
- for a field, add the field name as a key in the scope object and set its value
- to an object with the following properties:
- `filter_id`:::
- (string) The id of the filter to be used.
- `filter_type`:::
- (string) Either `include` (the rule applies for values in the filter) or
- `exclude` (the rule applies for values not in the filter). Defaults to
- `include`.
- `conditions`::
- (array) An optional array of numeric conditions when the rule applies. A rule
- must either have a non-empty scope or at least one condition. Multiple
- conditions are combined together with a logical `AND`. A condition has the
- following properties:
- `applies_to`:::
- (string) Specifies the result property to which the condition applies. The
- available options are `actual`, `typical`, `diff_from_typical`, `time`.
- `operator`:::
- (string) Specifies the condition operator. The available options are `gt`
- (greater than), `gte` (greater than or equals), `lt` (less than) and `lte` (less
- than or equals).
- `value`:::
- (double) The value that is compared against the `applies_to` field using the
- `operator`.
- --
- +
- --
- NOTE: If your detector uses `lat_long`, `metric`, `rare`, or `freq_rare`
- functions, you can only specify `conditions` that apply to `time`.
- --
- end::custom-rules[]
- tag::custom-settings[]
- Advanced configuration option. Contains custom meta data about the job. For
- example, it can contain custom URL information as shown in
- {stack-ov}/ml-configuring-url.html[Adding custom URLs to {ml} results].
- end::custom-settings[]
- tag::data-description[]
- The data description defines the format of the input data when you send data to
- the job by using the <<ml-post-data,post data>> API. Note that when configure
- a {dfeed}, these properties are automatically set.
- +
- --
- When data is received via the <<ml-post-data,post data>> API, it is not stored
- in {es}. Only the results for {anomaly-detect} are retained.
- A data description object has the following properties:
- `format`:::
- (string) Only `JSON` format is supported at this time.
- `time_field`:::
- (string) The name of the field that contains the timestamp.
- The default value is `time`.
- `time_format`:::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=time-format]
- --
- end::data-description[]
- tag::data-frame-analytics[]
- An array of {dfanalytics-job} resources, which are sorted by the `id` value in
- ascending order.
- `id`:::
- (string) The unique identifier of the {dfanalytics-job}.
- `source`:::
- (object) The configuration of how the analysis data is sourced. It has an
- `index` parameter and optionally a `query` and a `_source`.
-
- `index`::::
- (array) Index or indices on which to perform the analysis. It can be a single
- index or index pattern as well as an array of indices or patterns.
-
- `query`::::
- (object) The query that has been specified for the {dfanalytics-job}. The {es}
- query domain-specific language (<<query-dsl,DSL>>). This value corresponds to
- the query object in an {es} search POST body. By default, this property has the
- following value: `{"match_all": {}}`.
- `_source`::::
- (object) Contains the specified `includes` and/or `excludes` patterns that
- select which fields are present in the destination. Fields that are excluded
- cannot be included in the analysis.
-
- `includes`:::::
- (array) An array of strings that defines the fields that are included in the
- destination.
-
- `excludes`:::::
- (array) An array of strings that defines the fields that are excluded from the
- destination.
- `dest`:::
- (string) The destination configuration of the analysis.
- `index`::::
- (string) The _destination index_ that stores the results of the
- {dfanalytics-job}.
- `results_field`::::
- (string) The name of the field that stores the results of the analysis. Defaults
- to `ml`.
- `analysis`:::
- (object) The type of analysis that is performed on the `source`.
- `analyzed_fields`:::
- (object) Contains `includes` and/or `excludes` patterns that select which fields
- are included in the analysis.
-
- `includes`::::
- (Optional, array) An array of strings that defines the fields that are included
- in the analysis.
-
- `excludes`::::
- (Optional, array) An array of strings that defines the fields that are excluded
- from the analysis.
- `model_memory_limit`:::
- (string) The `model_memory_limit` that has been set to the {dfanalytics-job}.
- end::data-frame-analytics[]
- tag::data-frame-analytics-stats[]
- An array of statistics objects for {dfanalytics-jobs}, which are
- sorted by the `id` value in ascending order.
- `id`:::
- (string) The unique identifier of the {dfanalytics-job}.
-
- `state`:::
- (string) Current state of the {dfanalytics-job}.
-
- `progress`:::
- (array) The progress report of the {dfanalytics-job} by phase.
-
- `phase`:::
- (string) Defines the phase of the {dfanalytics-job}. Possible phases:
- `reindexing`, `loading_data`, `analyzing`, and `writing_results`.
-
- `progress_percent`:::
- (integer) The progress that the {dfanalytics-job} has made expressed in
- percentage.
- end::data-frame-analytics-stats[]
- tag::datafeed-id[]
- A numerical character string that uniquely identifies the
- {dfeed}. This identifier can contain lowercase alphanumeric characters (a-z
- and 0-9), hyphens, and underscores. It must start and end with alphanumeric
- characters.
- end::datafeed-id[]
- tag::datafeed-id-wildcard[]
- Identifier for the {dfeed}. It can be a {dfeed} identifier or a wildcard
- expression.
- end::datafeed-id-wildcard[]
- tag::delayed-data-check-config[]
- Specifies whether the {dfeed} checks for missing data and the size of the
- window. For example: `{"enabled": true, "check_window": "1h"}`.
- +
- --
- The {dfeed} can optionally search over indices that have already been read in
- an effort to determine whether any data has subsequently been added to the index.
- If missing data is found, it is a good indication that the `query_delay` option
- is set too low and the data is being indexed after the {dfeed} has passed that
- moment in time. See
- {stack-ov}/ml-delayed-data-detection.html[Working with delayed data].
- This check runs only on real-time {dfeeds}.
- The configuration object has the following properties:
- `enabled`::
- (boolean) Specifies whether the {dfeed} periodically checks for delayed data.
- Defaults to `true`.
- `check_window`::
- (<<time-units,time units>>) The window of time that is searched for late data.
- This window of time ends with the latest finalized bucket. It defaults to
- `null`, which causes an appropriate `check_window` to be calculated when the
- real-time {dfeed} runs. In particular, the default `check_window` span
- calculation is based on the maximum of `2h` or `8 * bucket_span`.
- --
- end::delayed-data-check-config[]
- tag::dependent-variable[]
- Defines which field of the document is to be predicted.
- This parameter is supplied by field name and must match one of the fields in
- the index being used to train. If this field is missing from a document, then
- that document will not be used for training, but a prediction with the trained
- model will be generated for it. It is also known as continuous target variable.
- end::dependent-variable[]
- tag::description-dfa[]
- A description of the job.
- end::description-dfa[]
- tag::dest[]
- The destination configuration, consisting of `index` and
- optionally `results_field` (`ml` by default).
- `index`:::
- (Required, string) Defines the _destination index_ to store the results of
- the {dfanalytics-job}.
-
- `results_field`:::
- (Optional, string) Defines the name of the field in which to store the
- results of the analysis. Default to `ml`.
- end::dest[]
- tag::detector-description[]
- A description of the detector. For example, `Low event rate`.
- end::detector-description[]
- tag::detector-field-name[]
- The field that the detector uses in the function. If you use an event rate
- function such as `count` or `rare`, do not specify this field.
- +
- --
- NOTE: The `field_name` cannot contain double quotes or backslashes.
- --
- end::detector-field-name[]
- tag::detector-index[]
- A unique identifier for the detector. This identifier is based on the order of
- the detectors in the `analysis_config`, starting at zero. You can use this
- identifier when you want to update a specific detector.
- end::detector-index[]
- tag::detector[]
- A detector has the following properties:
- `by_field_name`::::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=by-field-name]
- `custom_rules`::::
- (array)
- include::{docdir}/ml/ml-shared.asciidoc[tag=custom-rules]
- `detector_description`::::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=detector-description]
- `detector_index`::::
- (integer)
- include::{docdir}/ml/ml-shared.asciidoc[tag=detector-index]
- `exclude_frequent`::::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=exclude-frequent]
- `field_name`::::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=detector-field-name]
- `function`::::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=function]
- `over_field_name`::::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=over-field-name]
- `partition_field_name`::::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=partition-field-name]
- `use_null`::::
- (boolean)
- include::{docdir}/ml/ml-shared.asciidoc[tag=use-null]
- end::detector[]
- tag::eta[]
- The shrinkage applied to the weights. Smaller values result
- in larger forests which have better generalization error. However, the smaller
- the value the longer the training will take. For more information, see
- https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article]
- about shrinkage.
- end::eta[]
- tag::exclude-frequent[]
- Contains one of the following values: `all`, `none`, `by`, or `over`. If set,
- frequent entities are excluded from influencing the anomaly results. Entities
- can be considered frequent over time or frequent in a population. If you are
- working with both over and by fields, then you can set `exclude_frequent` to
- `all` for both fields, or to `by` or `over` for those specific fields.
- end::exclude-frequent[]
- tag::feature-bag-fraction[]
- Defines the fraction of features that will be used when
- selecting a random bag for each candidate split.
- end::feature-bag-fraction[]
- tag::feature-influence-threshold[]
- The minimum {olscore} that a document needs to have in order to calculate its
- {fiscore}. Value range: 0-1 (`0.1` by default).
- end::feature-influence-threshold[]
- tag::field-selection[]
- An array of objects that explain selection for each field, sorted by
- the field names. Each object in the array has the following properties:
- `name`:::
- (string) The field name.
- `mapping_types`:::
- (string) The mapping types of the field.
- `is_included`:::
- (boolean) Whether the field is selected to be included in the analysis.
- `is_required`:::
- (boolean) Whether the field is required.
- `feature_type`:::
- (string) The feature type of this field for the analysis. May be `categorical`
- or `numerical`.
- `reason`:::
- (string) The reason a field is not selected to be included in the analysis.
- end::field-selection[]
- tag::filter[]
- One or more <<analysis-tokenfilters,token filters>>. In addition to the built-in
- token filters, other plugins can provide more token filters. This property is
- optional. If it is not specified, no token filters are applied prior to
- categorization.
- end::filter[]
- tag::frequency[]
- The interval at which scheduled queries are made while the {dfeed} runs in real
- time. The default value is either the bucket span for short bucket spans, or,
- for longer bucket spans, a sensible fraction of the bucket span. For example:
- `150s`.
- end::frequency[]
- tag::from[]
- Skips the specified number of {dfanalytics-jobs}. The default value is `0`.
- end::from[]
- tag::function[]
- The analysis function that is used. For example, `count`, `rare`, `mean`, `min`,
- `max`, and `sum`. For more information, see
- {stack-ov}/ml-functions.html[Function reference].
- end::function[]
- tag::gamma[]
- Regularization parameter to prevent overfitting on the
- training dataset. Multiplies a linear penalty associated with the size of
- individual trees in the forest. The higher the value the more training will
- prefer smaller trees. The smaller this parameter the larger individual trees
- will be and the longer train will take.
- end::gamma[]
- tag::groups[]
- A list of job groups. A job can belong to no groups or many.
- end::groups[]
- tag::indices[]
- An array of index names. Wildcards are supported. For example:
- `["it_ops_metrics", "server*"]`.
- +
- --
- NOTE: If any indices are in remote clusters then `cluster.remote.connect` must
- not be set to `false` on any {ml} nodes.
- --
- end::indices[]
- tag::influencers[]
- A comma separated list of influencer field names. Typically these can be the by,
- over, or partition fields that are used in the detector configuration. You might
- also want to use a field name that is not specifically named in a detector, but
- is available as part of the input data. When you use multiple detectors, the use
- of influencers is recommended as it aggregates results for each influencer entity.
- end::influencers[]
- tag::job-id-anomaly-detection[]
- Identifier for the {anomaly-job}.
- end::job-id-anomaly-detection[]
- tag::job-id-data-frame-analytics[]
- Identifier for the {dfanalytics-job}.
- end::job-id-data-frame-analytics[]
- tag::job-id-anomaly-detection-default[]
- Identifier for the {anomaly-job}. It can be a job identifier, a group name, or a
- wildcard expression. If you do not specify one of these options, the API returns
- information for all {anomaly-jobs}.
- end::job-id-anomaly-detection-default[]
- tag::job-id-data-frame-analytics-default[]
- Identifier for the {dfanalytics-job}. If you do not specify this option, the API
- returns information for the first hundred {dfanalytics-jobs}.
- end::job-id-data-frame-analytics-default[]
- tag::job-id-anomaly-detection-list[]
- An identifier for the {anomaly-jobs}. It can be a job
- identifier, a group name, or a comma-separated list of jobs or groups.
- end::job-id-anomaly-detection-list[]
- tag::job-id-anomaly-detection-wildcard[]
- Identifier for the {anomaly-job}. It can be a job identifier, a group name, or a
- wildcard expression.
- end::job-id-anomaly-detection-wildcard[]
- tag::job-id-anomaly-detection-wildcard-list[]
- Identifier for the {anomaly-job}. It can be a job identifier, a group name, a
- comma-separated list of jobs or groups, or a wildcard expression.
- end::job-id-anomaly-detection-wildcard-list[]
- tag::job-id-anomaly-detection-define[]
- Identifier for the {anomaly-job}. This identifier can contain lowercase
- alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start
- and end with alphanumeric characters.
- end::job-id-anomaly-detection-define[]
- tag::job-id-data-frame-analytics-define[]
- Identifier for the {dfanalytics-job}. This identifier can contain lowercase
- alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start
- and end with alphanumeric characters.
- end::job-id-data-frame-analytics-define[]
- tag::job-id-datafeed[]
- The unique identifier for the job to which the {dfeed} sends data.
- end::job-id-datafeed[]
- tag::lambda[]
- Regularization parameter to prevent overfitting on the
- training dataset. Multiplies an L2 regularisation term which applies to leaf
- weights of the individual trees in the forest. The higher the value the more
- training will attempt to keep leaf weights small. This makes the prediction
- function smoother at the expense of potentially not being able to capture
- relevant relationships between the features and the {depvar}. The smaller this
- parameter the larger individual trees will be and the longer train will take.
- end::lambda[]
- tag::latency[]
- The size of the window in which to expect data that is out of time order. The
- default value is 0 (no latency). If you specify a non-zero value, it must be
- greater than or equal to one second. For more information about time units, see
- <<time-units>>.
- +
- --
- NOTE: Latency is only applicable when you send data by using
- the <<ml-post-data,post data>> API.
- --
- end::latency[]
- tag::max-empty-searches[]
- If a real-time {dfeed} has never seen any data (including during any initial
- training period) then it will automatically stop itself and close its associated
- job after this many real-time searches that return no documents. In other words,
- it will stop after `frequency` times `max_empty_searches` of real-time operation.
- If not set then a {dfeed} with no end time that sees no data will remain started
- until it is explicitly stopped. By default this setting is not set.
- end::max-empty-searches[]
- tag::maximum-number-trees[]
- Defines the maximum number of trees the forest is allowed
- to contain. The maximum value is 2000.
- end::maximum-number-trees[]
- tag::memory-estimation[]
- An object containing the memory estimates. The object has the
- following properties:
- `expected_memory_without_disk`:::
- (string) Estimated memory usage under the assumption that the whole
- {dfanalytics} should happen in memory (i.e. without overflowing to disk).
- `expected_memory_with_disk`:::
- (string) Estimated memory usage under the assumption that overflowing to disk is
- allowed during {dfanalytics}. `expected_memory_with_disk` is usually smaller
- than `expected_memory_without_disk` as using disk allows to limit the main
- memory needed to perform {dfanalytics}.
- end::memory-estimation[]
- tag::method[]
- Sets the method that {oldetection} uses. If the method is not set {oldetection}
- uses an ensemble of different methods and normalises and combines their
- individual {olscores} to obtain the overall {olscore}. We recommend to use the
- ensemble method. Available methods are `lof`, `ldof`, `distance_kth_nn`,
- `distance_knn`.
- end::method[]
- tag::mode[]
- There are three available modes:
- +
- --
- * `auto`: The chunk size is dynamically calculated. This is the default and
- recommended value.
- * `manual`: Chunking is applied according to the specified `time_span`.
- * `off`: No chunking is applied.
- --
- end::mode[]
- tag::model-memory-limit[]
- The approximate maximum amount of memory resources that are required for
- analytical processing. Once this limit is approached, data pruning becomes
- more aggressive. Upon exceeding this limit, new entities are not modeled. The
- default value for jobs created in version 6.1 and later is `1024mb`.
- This value will need to be increased for jobs that are expected to analyze high
- cardinality fields, but the default is set to a relatively small size to ensure
- that high resource usage is a conscious decision. The default value for jobs
- created in versions earlier than 6.1 is `4096mb`.
- +
- --
- If you specify a number instead of a string, the units are assumed to be MiB.
- Specifying a string is recommended for clarity. If you specify a byte size unit
- of `b` or `kb` and the number does not equate to a discrete number of megabytes,
- it is rounded down to the closest MiB. The minimum valid value is 1 MiB. If you
- specify a value less than 1 MiB, an error occurs. For more information about
- supported byte size units, see <<byte-units>>.
- If your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
- setting, an error occurs when you try to create jobs that have
- `model_memory_limit` values greater than that setting. For more information,
- see <<ml-settings>>.
- --
- end::model-memory-limit[]
- tag::model-memory-limit-dfa[]
- The approximate maximum amount of memory resources that are permitted for
- analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If
- your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
- setting, an error occurs when you try to create {dfanalytics-jobs} that have
- `model_memory_limit` values greater than that setting. For more information, see
- <<ml-settings>>.
- end::model-memory-limit-dfa[]
- tag::model-plot-config[]
- This advanced configuration option stores model information along with the
- results. It provides a more detailed view into {anomaly-detect}.
- +
- --
- WARNING: If you enable model plot it can add considerable overhead to the
- performance of the system; it is not feasible for jobs with many entities.
- Model plot provides a simplified and indicative view of the model and its
- bounds. It does not display complex features such as multivariate correlations
- or multimodal data. As such, anomalies may occasionally be reported which cannot
- be seen in the model plot.
- Model plot config can be configured when the job is created or updated later. It
- must be disabled if performance issues are experienced.
- The `model_plot_config` object has the following properties:
- `enabled`:::
- (boolean) If true, enables calculation and storage of the model bounds for
- each entity that is being analyzed. By default, this is not enabled.
- `terms`:::
- experimental[] (string) Limits data collection to this comma separated list of
- partition or by field values. If terms are not specified or it is an empty
- string, no filtering is applied. For example, "CPU,NetworkIn,DiskWrites".
- Wildcards are not supported. Only the specified `terms` can be viewed when
- using the Single Metric Viewer.
- --
- end::model-plot-config[]
- tag::model-snapshot-id[]
- A numerical character string that uniquely identifies the model snapshot. For
- example, `1491007364`. For more information about model snapshots, see
- <<ml-snapshot-resource>>.
- end::model-snapshot-id[]
- tag::model-snapshot-retention-days[]
- The time in days that model snapshots are retained for the job. Older snapshots
- are deleted. The default value is `1`, which means snapshots are retained for
- one day (twenty-four hours).
- end::model-snapshot-retention-days[]
- tag::multivariate-by-fields[]
- This functionality is reserved for internal use. It is not supported for use in
- customer environments and is not subject to the support SLA of official GA
- features.
- +
- --
- If set to `true`, the analysis will automatically find correlations between
- metrics for a given `by` field value and report anomalies when those
- correlations cease to hold. For example, suppose CPU and memory usage on host A
- is usually highly correlated with the same metrics on host B. Perhaps this
- correlation occurs because they are running a load-balanced application.
- If you enable this property, then anomalies will be reported when, for example,
- CPU usage on host A is high and the value of CPU usage on host B is low. That
- is to say, you'll see an anomaly when the CPU of host A is unusual given
- the CPU of host B.
- NOTE: To use the `multivariate_by_fields` property, you must also specify
- `by_field_name` in your detector.
- --
- end::multivariate-by-fields[]
- tag::n-neighbors[]
- Defines the value for how many nearest neighbors each method of
- {oldetection} will use to calculate its {olscore}. When the value is not set,
- different values will be used for different ensemble members. This helps
- improve diversity in the ensemble. Therefore, only override this if you are
- confident that the value you choose is appropriate for the data set.
- end::n-neighbors[]
- tag::num-top-classes[]
- Defines the number of categories for which the predicted
- probabilities are reported. It must be non-negative. If it is greater than the
- total number of categories (in the {version} version of the {stack}, it's two)
- to predict then we will report all category probabilities. Defaults to 2.
- end::num-top-classes[]
- tag::over-field-name[]
- The field used to split the data. In particular, this property is used for
- analyzing the splits with respect to the history of all splits. It is used for
- finding unusual values in the population of all splits. For more information,
- see {stack-ov}/ml-configuring-pop.html[Performing population analysis].
- end::over-field-name[]
- tag::outlier-fraction[]
- Sets the proportion of the data set that is assumed to be outlying prior to
- {oldetection}. For example, 0.05 means it is assumed that 5% of values are real
- outliers and 95% are inliers.
- end::outlier-fraction[]
- tag::partition-field-name[]
- The field used to segment the analysis. When you use this property, you have
- completely independent baselines for each value of this field.
- end::partition-field-name[]
- tag::prediction-field-name[]
- Defines the name of the prediction field in the results.
- Defaults to `<dependent_variable>_prediction`.
- end::prediction-field-name[]
- tag::randomize-seed[]
- Defines the seed to the random generator that is used to pick which documents
- will be used for training. By default it is randomly generated. Set it to a
- specific value to ensure the same documents are used for training assuming other
- related parameters (for example, `source`, `analyzed_fields`, etc.) are the
- same.
- end::randomize-seed[]
- tag::query[]
- The {es} query domain-specific language (DSL). This value corresponds to the
- query object in an {es} search POST body. All the options that are supported by
- {es} can be used, as this object is passed verbatim to {es}. By default, this
- property has the following value: `{"match_all": {"boost": 1}}`.
- end::query[]
- tag::query-delay[]
- The number of seconds behind real time that data is queried. For example, if
- data from 10:04 a.m. might not be searchable in {es} until 10:06 a.m., set this
- property to 120 seconds. The default value is randomly selected between `60s`
- and `120s`. This randomness improves the query performance when there are
- multiple jobs running on the same node.
- end::query-delay[]
- tag::renormalization-window-days[]
- Advanced configuration option. The period over which adjustments to the score
- are applied, as new data is seen. The default value is the longer of 30 days or
- 100 `bucket_spans`.
- end::renormalization-window-days[]
- tag::results-index-name[]
- A text string that affects the name of the {ml} results index. The default value
- is `shared`, which generates an index named `.ml-anomalies-shared`.
- end::results-index-name[]
- tag::results-retention-days[]
- Advanced configuration option. The number of days for which job results are
- retained. Once per day at 00:30 (server time), results older than this period
- are deleted from {es}. The default value is null, which means results are
- retained.
- end::results-retention-days[]
- tag::script-fields[]
- Specifies scripts that evaluate custom expressions and returns script fields to
- the {dfeed}. The detector configuration objects in a job can contain functions
- that use these script fields. For more information, see
- {stack-ov}/ml-configuring-transform.html[Transforming data with script fields]
- and <<request-body-search-script-fields,Script fields>>.
- end::script-fields[]
- tag::scroll-size[]
- The `size` parameter that is used in {es} searches. The default value is `1000`.
- end::scroll-size[]
- tag::size[]
- Specifies the maximum number of {dfanalytics-jobs} to obtain. The default value
- is `100`.
- end::size[]
- tag::source-put-dfa[]
- The configuration of how to source the analysis data. It requires an
- `index`. Optionally, `query` and `_source` may be specified.
- `index`:::
- (Required, string or array) Index or indices on which to perform the
- analysis. It can be a single index or index pattern as well as an array of
- indices or patterns.
-
- `query`:::
- (Optional, object) The {es} query domain-specific language
- (<<query-dsl,DSL>>). This value corresponds to the query object in an {es}
- search POST body. All the options that are supported by {es} can be used,
- as this object is passed verbatim to {es}. By default, this property has
- the following value: `{"match_all": {}}`.
- `_source`:::
- (Optional, object) Specify `includes` and/or `excludes` patterns to select
- which fields will be present in the destination. Fields that are excluded
- cannot be included in the analysis.
-
- `includes`::::
- (array) An array of strings that defines the fields that will be
- included in the destination.
-
- `excludes`::::
- (array) An array of strings that defines the fields that will be
- excluded from the destination.
- end::source-put-dfa[]
- tag::standardization-enabled[]
- If `true`, then the following operation is performed on the columns before
- computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to `true`. For
- more information, see
- https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].
- end::standardization-enabled[]
- tag::summary-count-field-name[]
- If this property is specified, the data that is fed to the job is expected to be
- pre-summarized. This property value is the name of the field that contains the
- count of raw data points that have been summarized. The same
- `summary_count_field_name` applies to all detectors in the job.
- +
- --
- NOTE: The `summary_count_field_name` property cannot be used with the `metric`
- function.
- --
- end::summary-count-field-name[]
- tag::timeout-start[]
- Controls the amount of time to wait until the {dfanalytics-job} starts. Defaults
- to 20 seconds.
- end::timeout-start[]
- tag::timeout-stop[]
- Controls the amount of time to wait until the {dfanalytics-job} stops. Defaults
- to 20 seconds.
- end::timeout-stop[]
- tag::time-format[]
- The time format, which can be `epoch`, `epoch_ms`, or a custom pattern. The
- default value is `epoch`, which refers to UNIX or Epoch time (the number of
- seconds since 1 Jan 1970). The value `epoch_ms` indicates that time is measured
- in milliseconds since the epoch. The `epoch` and `epoch_ms` time formats accept
- either integer or real values. +
- +
- --
- NOTE: Custom patterns must conform to the Java `DateTimeFormatter` class.
- When you use date-time formatting patterns, it is recommended that you provide
- the full date, time and time zone. For example: `yyyy-MM-dd'T'HH:mm:ssX`.
- If the pattern that you specify is not sufficient to produce a complete
- timestamp, job creation fails.
- --
- end::time-format[]
- tag::time-span[]
- The time span that each search will be querying. This setting is only applicable
- when the mode is set to `manual`. For example: `3h`.
- end::time-span[]
- tag::tokenizer[]
- The name or definition of the <<analysis-tokenizers,tokenizer>> to use after
- character filters are applied. This property is compulsory if
- `categorization_analyzer` is specified as an object. Machine learning provides a
- tokenizer called `ml_classic` that tokenizes in the same way as the
- non-customizable tokenizer in older versions of the product. If you want to use
- that tokenizer but change the character or token filters, specify
- `"tokenizer": "ml_classic"` in your `categorization_analyzer`.
- end::tokenizer[]
- tag::training-percent[]
- Defines what percentage of the eligible documents that will
- be used for training. Documents that are ignored by the analysis (for example
- those that contain arrays) won’t be included in the calculation for used
- percentage. Defaults to `100`.
- end::training-percent[]
- tag::use-null[]
- Defines whether a new series is used as the null series when there is no value
- for the by or partition fields. The default value is `false`.
- end::use-null[]
|