12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676167716781679168016811682168316841685168616871688168916901691169216931694169516961697169816991700170117021703170417051706170717081709171017111712 |
- tag::aggregations[]
- If set, the {dfeed} performs aggregation searches. Support for aggregations is
- limited and should be used only with low cardinality data. For more information,
- see
- {ml-docs}/ml-configuring-aggregation.html[Aggregating data for faster performance].
- end::aggregations[]
- tag::allow-lazy-open[]
- Advanced configuration option. Specifies whether this job can open when there is
- insufficient {ml} node capacity for it to be immediately assigned to a node. The
- default value is `false`; if a {ml} node with capacity to run the job cannot
- immediately be found, the <<ml-open-job,open {anomaly-jobs} API>> returns an
- error. However, this is also subject to the cluster-wide
- `xpack.ml.max_lazy_ml_nodes` setting; see <<advanced-ml-settings>>. If this
- option is set to `true`, the <<ml-open-job,open {anomaly-jobs} API>> does not
- return an error and the job waits in the `opening` state until sufficient {ml}
- node capacity is available.
- end::allow-lazy-open[]
- tag::allow-no-datafeeds[]
- Specifies what to do when the request:
- +
- --
- * Contains wildcard expressions and there are no {dfeeds} that match.
- * Contains the `_all` string or no identifiers and there are no matches.
- * Contains wildcard expressions and there are only partial matches.
- The default value is `true`, which returns an empty `datafeeds` array when
- there are no matches and the subset of results when there are partial matches.
- If this parameter is `false`, the request returns a `404` status code when there
- are no matches or only partial matches.
- --
- end::allow-no-datafeeds[]
- tag::allow-no-jobs[]
- Specifies what to do when the request:
- +
- --
- * Contains wildcard expressions and there are no jobs that match.
- * Contains the `_all` string or no identifiers and there are no matches.
- * Contains wildcard expressions and there are only partial matches.
- The default value is `true`, which returns an empty `jobs` array
- when there are no matches and the subset of results when there are partial
- matches. If this parameter is `false`, the request returns a `404` status code
- when there are no matches or only partial matches.
- --
- end::allow-no-jobs[]
- tag::allow-no-match[]
- Specifies what to do when the request:
- +
- --
- * Contains wildcard expressions and there are no {dfanalytics-jobs} that match.
- * Contains the `_all` string or no identifiers and there are no matches.
- * Contains wildcard expressions and there are only partial matches.
- The default value is `true`, which returns an empty `data_frame_analytics` array
- when there are no matches and the subset of results when there are partial
- matches. If this parameter is `false`, the request returns a `404` status code
- when there are no matches or only partial matches.
- --
- end::allow-no-match[]
- tag::analysis[]
- Defines the type of {dfanalytics} you want to perform on your source index. For
- example: `outlier_detection`. See <<ml-dfa-analysis-objects>>.
- end::analysis[]
- tag::analysis-config[]
- The analysis configuration, which specifies how to analyze the data. After you
- create a job, you cannot change the analysis configuration; all the properties
- are informational.
- end::analysis-config[]
- tag::analysis-limits[]
- Limits can be applied for the resources required to hold the mathematical models
- in memory. These limits are approximate and can be set per job. They do not
- control the memory used by other processes, for example the {es} Java processes.
- end::analysis-limits[]
- tag::assignment-explanation-anomaly-jobs[]
- For open {anomaly-jobs} only, contains messages relating to the selection
- of a node to run the job.
- end::assignment-explanation-anomaly-jobs[]
- tag::assignment-explanation-datafeeds[]
- For started {dfeeds} only, contains messages relating to the selection of a
- node.
- end::assignment-explanation-datafeeds[]
- tag::assignment-explanation-dfanalytics[]
- Contains messages relating to the selection of a node.
- end::assignment-explanation-dfanalytics[]
- tag::background-persist-interval[]
- Advanced configuration option. The time between each periodic persistence of the
- model. The default value is a randomized value between 3 to 4 hours, which
- avoids all jobs persisting at exactly the same time. The smallest allowed value
- is 1 hour.
- +
- --
- TIP: For very large models (several GB), persistence could take 10-20 minutes,
- so do not set the `background_persist_interval` value too low.
- --
- end::background-persist-interval[]
- tag::bucket-allocation-failures-count[]
- The number of buckets for which new entities in incoming data were not processed
- due to insufficient model memory. This situation is also signified by a
- `hard_limit: memory_status` property value.
- end::bucket-allocation-failures-count[]
- tag::bucket-count[]
- The number of buckets processed.
- end::bucket-count[]
- tag::bucket-count-anomaly-jobs[]
- The number of bucket results produced by the job.
- end::bucket-count-anomaly-jobs[]
- tag::bucket-span[]
- The size of the interval that the analysis is aggregated into, typically between
- `5m` and `1h`. The default value is `5m`. If the {anomaly-job} uses a {dfeed}
- with {ml-docs}/ml-configuring-aggregation.html[aggregations], this value must be
- divisible by the interval of the date histogram aggregation. For more
- information, see {ml-docs}/ml-buckets.html[Buckets].
- end::bucket-span[]
- tag::bucket-span-results[]
- The length of the bucket in seconds. This value matches the `bucket_span`
- that is specified in the job.
- end::bucket-span-results[]
- tag::bucket-time-exponential-average[]
- Exponential moving average of all bucket processing times, in milliseconds.
- end::bucket-time-exponential-average[]
- tag::bucket-time-exponential-average-hour[]
- Exponentially-weighted moving average of bucket processing times
- calculated in a 1 hour time window, in milliseconds.
- end::bucket-time-exponential-average-hour[]
- tag::bucket-time-maximum[]
- Maximum among all bucket processing times, in milliseconds.
- end::bucket-time-maximum[]
- tag::bucket-time-minimum[]
- Minimum among all bucket processing times, in milliseconds.
- end::bucket-time-minimum[]
- tag::bucket-time-total[]
- Sum of all bucket processing times, in milliseconds.
- end::bucket-time-total[]
- tag::by-field-name[]
- The field used to split the data. In particular, this property is used for
- analyzing the splits with respect to their own history. It is used for finding
- unusual values in the context of the split.
- end::by-field-name[]
- tag::calendar-id[]
- A string that uniquely identifies a calendar.
- end::calendar-id[]
- tag::categorization-analyzer[]
- If `categorization_field_name` is specified, you can also define the analyzer
- that is used to interpret the categorization field. This property cannot be used
- at the same time as `categorization_filters`. The categorization analyzer
- specifies how the `categorization_field` is interpreted by the categorization
- process. The syntax is very similar to that used to define the `analyzer` in the
- <<indices-analyze,Analyze endpoint>>. For more information, see
- {ml-docs}/ml-configuring-categories.html[Categorizing log messages].
- +
- The `categorization_analyzer` field can be specified either as a string or as an
- object. If it is a string it must refer to a
- <<analysis-analyzers,built-in analyzer>> or one added by another plugin. If it
- is an object it has the following properties:
- +
- .Properties of `categorization_analyzer`
- [%collapsible%open]
- =====
- `char_filter`::::
- (array of strings or objects)
- include::{docdir}/ml/ml-shared.asciidoc[tag=char-filter]
- `tokenizer`::::
- (string or object)
- include::{docdir}/ml/ml-shared.asciidoc[tag=tokenizer]
- `filter`::::
- (array of strings or objects)
- include::{docdir}/ml/ml-shared.asciidoc[tag=filter]
- =====
- end::categorization-analyzer[]
- tag::categorization-examples-limit[]
- The maximum number of examples stored per category in memory and in the results
- data store. The default value is 4. If you increase this value, more examples
- are available, however it requires that you have more storage available. If you
- set this value to `0`, no examples are stored.
- +
- NOTE: The `categorization_examples_limit` only applies to analysis that uses
- categorization. For more information, see
- {ml-docs}/ml-configuring-categories.html[Categorizing log messages].
- end::categorization-examples-limit[]
- tag::categorization-field-name[]
- If this property is specified, the values of the specified field will be
- categorized. The resulting categories must be used in a detector by setting
- `by_field_name`, `over_field_name`, or `partition_field_name` to the keyword
- `mlcategory`. For more information, see
- {ml-docs}/ml-configuring-categories.html[Categorizing log messages].
- end::categorization-field-name[]
- tag::categorization-filters[]
- If `categorization_field_name` is specified, you can also define optional
- filters. This property expects an array of regular expressions. The expressions
- are used to filter out matching sequences from the categorization field values.
- You can use this functionality to fine tune the categorization by excluding
- sequences from consideration when categories are defined. For example, you can
- exclude SQL statements that appear in your log files. For more information, see
- {ml-docs}/ml-configuring-categories.html[Categorizing log messages]. This
- property cannot be used at the same time as `categorization_analyzer`. If you
- only want to define simple regular expression filters that are applied prior to
- tokenization, setting this property is the easiest method. If you also want to
- customize the tokenizer or post-tokenization filtering, use the
- `categorization_analyzer` property instead and include the filters as
- `pattern_replace` character filters. The effect is exactly the same.
- end::categorization-filters[]
- tag::categorization-status[]
- The status of categorization for the job. Contains one of the following values:
- +
- --
- * `ok`: Categorization is performing acceptably well (or not being used at all).
- * `warn`: Categorization is detecting a distribution of categories that suggests
- the input data is inappropriate for categorization. Problems could be that there
- is only one category, more than 90% of categories are rare, the number of
- categories is greater than 50% of the number of categorized documents, there are
- no frequently matched categories, or more than 50% of categories are dead.
- --
- end::categorization-status[]
- tag::categorized-doc-count[]
- The number of documents that have had a field categorized.
- end::categorized-doc-count[]
- tag::char-filter[]
- One or more <<analysis-charfilters,character filters>>. In addition to the
- built-in character filters, other plugins can provide more character filters.
- This property is optional. If it is not specified, no character filters are
- applied prior to categorization. If you are customizing some other aspect of the
- analyzer and you need to achieve the equivalent of `categorization_filters`
- (which are not permitted when some other aspect of the analyzer is customized),
- add them here as
- <<analysis-pattern-replace-charfilter,pattern replace character filters>>.
- end::char-filter[]
- tag::chunking-config[]
- {dfeeds-cap} might be required to search over long time periods, for several
- months or years. This search is split into time chunks in order to ensure the
- load on {es} is managed. Chunking configuration controls how the size of these
- time chunks are calculated and is an advanced configuration option.
- +
- .Properties of `chunking_config`
- [%collapsible%open]
- ====
- `mode`:::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=mode]
- `time_span`:::
- (<<time-units,time units>>)
- include::{docdir}/ml/ml-shared.asciidoc[tag=time-span]
- ====
- end::chunking-config[]
- tag::custom-rules[]
- An array of custom rule objects, which enable you to customize the way detectors
- operate. For example, a rule may dictate to the detector conditions under which
- results should be skipped. For more examples, see
- {ml-docs}/ml-configuring-detector-custom-rules.html[Customizing detectors with custom rules].
- end::custom-rules[]
- tag::custom-rules-actions[]
- The set of actions to be triggered when the rule applies. If
- more than one action is specified the effects of all actions are combined. The
- available actions include:
- * `skip_result`: The result will not be created. This is the default value.
- Unless you also specify `skip_model_update`, the model will be updated as usual
- with the corresponding series value.
- * `skip_model_update`: The value for that series will not be used to update the
- model. Unless you also specify `skip_result`, the results will be created as
- usual. This action is suitable when certain values are expected to be
- consistently anomalous and they affect the model in a way that negatively
- impacts the rest of the results.
- end::custom-rules-actions[]
- tag::custom-rules-scope[]
- An optional scope of series where the rule applies. A rule must either
- have a non-empty scope or at least one condition. By default, the scope includes
- all series. Scoping is allowed for any of the fields that are also specified in
- `by_field_name`, `over_field_name`, or `partition_field_name`. To add a scope
- for a field, add the field name as a key in the scope object and set its value
- to an object with the following properties:
- end::custom-rules-scope[]
- tag::custom-rules-scope-filter-id[]
- The id of the filter to be used.
- end::custom-rules-scope-filter-id[]
- tag::custom-rules-scope-filter-type[]
- Either `include` (the rule applies for values in the filter) or `exclude` (the
- rule applies for values not in the filter). Defaults to `include`.
- end::custom-rules-scope-filter-type[]
- tag::custom-rules-conditions[]
- An optional array of numeric conditions when the rule applies. A rule must
- either have a non-empty scope or at least one condition. Multiple conditions are
- combined together with a logical `AND`. A condition has the following
- properties:
- end::custom-rules-conditions[]
- tag::custom-rules-conditions-applies-to[]
- Specifies the result property to which the condition applies. The available
- options are `actual`, `typical`, `diff_from_typical`, `time`. If your detector
- uses `lat_long`, `metric`, `rare`, or `freq_rare` functions, you can only
- specify conditions that apply to `time`.
- end::custom-rules-conditions-applies-to[]
- tag::custom-rules-conditions-operator[]
- Specifies the condition operator. The available options are `gt` (greater than),
- `gte` (greater than or equals), `lt` (less than) and `lte` (less than or
- equals).
- end::custom-rules-conditions-operator[]
- tag::custom-rules-conditions-value[]
- The value that is compared against the `applies_to` field using the `operator`.
- end::custom-rules-conditions-value[]
- tag::custom-settings[]
- Advanced configuration option. Contains custom meta data about the job. For
- example, it can contain custom URL information as shown in
- {ml-docs}/ml-configuring-url.html[Adding custom URLs to {ml} results].
- end::custom-settings[]
- tag::data-description[]
- The data description defines the format of the input data when you send data to
- the job by using the <<ml-post-data,post data>> API. Note that when configure
- a {dfeed}, these properties are automatically set. When data is received via
- the <<ml-post-data,post data>> API, it is not stored in {es}. Only the results
- for {anomaly-detect} are retained.
- +
- .Properties of `data_description`
- [%collapsible%open]
- ====
- `format`:::
- (string) Only `JSON` format is supported at this time.
- `time_field`:::
- (string) The name of the field that contains the timestamp.
- The default value is `time`.
- `time_format`:::
- (string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=time-format]
- ====
- end::data-description[]
- tag::data-frame-analytics-stats[]
- An array of statistics objects for {dfanalytics-jobs}, which are
- sorted by the `id` value in ascending order.
- //Begin analysis_stats
- `analysis_stats`::
- (object)
- An object containing statistical data about the analysis.
- +
- .Properties of `analysis_stats`
- [%collapsible%open]
- ====
- //Begin classification_stats
- `classification_stats`:::
- (object)
- An object containing statistical data about the {classanalysis}.
- +
- .Properties of `classification_stats`
- [%collapsible%open]
- =====
- //Begin class_hyperparameters
- `hyperparameters`::::
- (object)
- An object containing the parameters of the {classanalysis}.
- +
- .Properties of `hyperparameters`
- [%collapsible%open]
- ======
- tag::dfas-alpha[]
- `alpha`::::
- (double)
- Regularization factor to penalize deeper trees when training decision trees.
- end::dfas-alpha[]
- `class_assignment_objective`::::
- (string)
- Defines whether class assignment maximizes the accuracy or the minimum recall
- metric. Possible values are `maximize_accuracy` and `maximize_minimum_recall`.
- tag::dfas-downsample-factor[]
- `downsample_factor`::::
- (double)
- The value of the downsample factor.
- end::dfas-downsample-factor[]
- tag::dfas-eta[]
- `eta`::::
- (double)
- The value of the eta hyperparameter.
- end::dfas-eta[]
- tag::dfas-eta-growth[]
- `eta_growth_rate_per_tree`::::
- (double)
- Specifies the rate at which the `eta` increases for each new tree that is added to the
- forest. For example, a rate of `1.05` increases `eta` by 5%.
- end::dfas-eta-growth[]
- tag::dfas-feature-bag-fraction[]
- `feature_bag_fraction`::::
- (double)
- The fraction of features that is used when selecting a random bag for each
- candidate split.
- end::dfas-feature-bag-fraction[]
- tag::dfas-gamma[]
- `gamma`::::
- (double)
- Regularization factor to penalize trees with large numbers of nodes.
- end::dfas-gamma[]
- tag::dfas-lambda[]
- `lambda`::::
- (double)
- Regularization factor to penalize large leaf weights.
- end::dfas-lambda[]
- tag::dfas-max-attempts[]
- `max_attempts_to_add_tree`::::
- (integer)
- If the algorithm fails to determine a non-trivial tree (more than a single
- leaf), this parameter determines how many of such consecutive failures are
- tolerated. Once the number of attempts exceeds the threshold, the forest
- training stops.
- end::dfas-max-attempts[]
- tag::dfas-max-optimization-rounds[]
- `max_optimization_rounds_per_hyperparameter`::::
- (integer)
- A multiplier responsible for determining the maximum number of
- hyperparameter optimization steps in the Bayesian optimization procedure.
- The maximum number of steps is determined based on the number of undefined hyperparameters
- times the maximum optimization rounds per hyperparameter.
- end::dfas-max-optimization-rounds[]
- tag::dfas-max-trees[]
- `max_trees`::::
- (integer)
- The maximum number of trees in the forest.
- end::dfas-max-trees[]
- tag::dfas-num-folds[]
- `num_folds`::::
- (integer)
- The maximum number of folds for the cross-validation procedure.
- end::dfas-num-folds[]
- tag::dfas-num-splits[]
- `num_splits_per_feature`::::
- (integer)
- Determines the maximum number of splits for every feature that can occur in a
- decision tree when the tree is trained.
- end::dfas-num-splits[]
- tag::dfas-soft-limit[]
- `soft_tree_depth_limit`::::
- (double)
- Tree depth limit is used for calculating the tree depth penalty. This is a soft
- limit, it can be exceeded.
- end::dfas-soft-limit[]
- tag::dfas-soft-tolerance[]
- `soft_tree_depth_tolerance`::::
- (double)
- Tree depth tolerance is used for calculating the tree depth penalty. This is a
- soft limit, it can be exceeded.
- end::dfas-soft-tolerance[]
- ======
- //End class_hyperparameters
- tag::dfas-iteration[]
- `iteration`::::
- (integer)
- The number of iterations on the analysis.
- end::dfas-iteration[]
- tag::dfas-timestamp[]
- `timestamp`::::
- (date)
- The timestamp when the statistics were reported in milliseconds since the epoch.
- end::dfas-timestamp[]
- //Begin class_timing_stats
- tag::dfas-timing-stats[]
- `timing_stats`::::
- (object)
- An object containing time statistics about the {dfanalytics-job}.
- end::dfas-timing-stats[]
- +
- .Properties of `timing_stats`
- [%collapsible%open]
- ======
- tag::dfas-timing-stats-elapsed[]
- `elapsed_time`::::
- (integer)
- Runtime of the analysis in milliseconds.
- end::dfas-timing-stats-elapsed[]
- tag::dfas-timing-stats-iteration[]
- `iteration_time`::::
- (integer)
- Runtime of the latest iteration of the analysis in milliseconds.
- end::dfas-timing-stats-iteration[]
- ======
- //End class_timing_stats
- //Begin class_validation_loss
- tag::dfas-validation-loss[]
- `validation_loss`::::
- (object)
- An object containing information about validation loss.
- end::dfas-validation-loss[]
- +
- .Properties of `validation_loss`
- [%collapsible%open]
- ======
- tag::dfas-validation-loss-type[]
- `loss_type`::::
- (string)
- The type of the loss metric. For example, `binomial_logistic`.
- end::dfas-validation-loss-type[]
- tag::dfas-validation-loss-fold[]
- `fold_values`::::
- (array of strings)
- Validation loss values for every added decision tree during the forest growing
- procedure.
- end::dfas-validation-loss-fold[]
- ======
- //End class_validation_loss
- =====
- //End classification_stats
- //Begin outlier_detection_stats
- `outlier_detection_stats`:::
- (object)
- An object containing statistical data about the {oldetection} job.
- +
- .Properties of `outlier_detection_stats`
- [%collapsible%open]
- =====
- //Begin parameters
- `parameters`::::
- (object)
- The list of job parameters specified by the user or determined by algorithmic
- heuristics.
- +
- .Properties of `parameters`
- [%collapsible%open]
- ======
- `compute_feature_influence`::::
- (boolean)
- If true, feature influence calculation is enabled.
- `feature_influence_threshold`::::
- (double)
- The minimum {olscore} that a document needs to have to calculate its feature
- influence score.
- `method`::::
- (string)
- The method that {oldetection} uses. Possible values are `lof`, `ldof`,
- `distance_kth_nn`, `distance_knn`, and `ensemble`.
- `n_neighbors`::::
- (integer)
- The value for how many nearest neighbors each method of {oldetection} uses to
- calculate its outlier score.
- `outlier_fraction`::::
- (double)
- The proportion of the data set that is assumed to be outlying prior to
- {oldetection}.
- `standardization_enabled`::::
- (boolean)
- If true, then the following operation is performed on the columns before
- computing {olscores}: (x_i - mean(x_i)) / sd(x_i).
- ======
- //End parameters
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timestamp]
- //Begin od_timing_stats
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats]
- +
- .Property of `timing_stats`
- [%collapsible%open]
- ======
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-elapsed]
- ======
- //End od_timing_stats
- =====
- //End outlier_detection_stats
- //Begin regression_stats
- `regression_stats`:::
- (object)
- An object containing statistical data about the {reganalysis}.
- +
- .Properties of `regression_stats`
- [%collapsible%open]
- =====
- //Begin reg_hyperparameters
- `hyperparameters`::::
- (object)
- An object containing the parameters of the {reganalysis}.
- +
- .Properties of `hyperparameters`
- [%collapsible%open]
- ======
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-alpha]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-downsample-factor]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-eta]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-eta-growth]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-feature-bag-fraction]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-gamma]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-lambda]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-attempts]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-optimization-rounds]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-max-trees]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-num-folds]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-num-splits]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-soft-limit]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-soft-tolerance]
- ======
- //End reg_hyperparameters
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-iteration]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timestamp]
- //Begin reg_timing_stats
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats]
- +
- .Propertis of `timing_stats`
- [%collapsible%open]
- ======
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-elapsed]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-timing-stats-iteration]
- ======
- //End reg_timing_stats
- //Begin reg_validation_loss
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss]
- +
- .Properties of `validation_loss`
- [%collapsible%open]
- ======
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss-type]
- include::{docdir}/ml/ml-shared.asciidoc[tag=dfas-validation-loss-fold]
- ======
- //End reg_validation_loss
- =====
- //End regression_stats
- ====
- //End analysis_stats
- `assignment_explanation`:::
- (string)
- For running jobs only, contains messages relating to the selection of a node to
- run the job.
- //Begin data_counts
- `data_counts`:::
- (object)
- An object containing statistical data about the documents in the analysis.
- +
- .Properties of `data_counts`
- [%collapsible%open]
- ====
- `skipped_docs_count`:::
- (integer)
- The number of documents that are skipped during the analysis because they
- contained values that are not supported by the analysis. For example,
- {oldetection} does not support missing fields so it skips documents with missing
- fields. Likewise, all types of analysis skip documents that contain arrays with
- more than one element.
- `test_docs_count`:::
- (integer)
- The number of documents that are not used for training the model and can be used
- for testing.
- `training_docs_count`:::
- (integer)
- The number of documents that are used for training the model.
- ====
- //End data_counts
- `id`:::
- (string)
- The unique identifier of the {dfanalytics-job}.
- `memory_usage`:::
- (Optional, object)
- An object describing memory usage of the analytics. It is present only after the
- job is started and memory usage is reported.
- `memory_usage`.`peak_usage_bytes`:::
- (long)
- The number of bytes used at the highest peak of memory usage.
- `memory_usage`.`timestamp`:::
- (date)
- The timestamp when memory usage was calculated.
- `node`:::
- (object)
- Contains properties for the node that runs the job. This information is
- available only for running jobs.
- `node`.`attributes`:::
- (object)
- Lists node attributes such as `ml.machine_memory`, `ml.max_open_jobs`, and
- `xpack.installed`.
- `node`.`ephemeral_id`:::
- (string)
- The ephemeral id of the node.
- `node`.`id`:::
- (string)
- The unique identifier of the node.
- `node`.`name`:::
- (string)
- The node name.
- `node`.`transport_address`:::
- (string)
- The host and port where transport HTTP connections are accepted.
- `progress`:::
- (array) The progress report of the {dfanalytics-job} by phase.
-
- `progress`.`phase`:::
- (string) Defines the phase of the {dfanalytics-job}. Possible phases:
- `reindexing`, `loading_data`, `analyzing`, and `writing_results`.
-
- `progress`.`progress_percent`:::
- (integer) The progress that the {dfanalytics-job} has made expressed in
- percentage.
- `state`:::
- (string) Current state of the {dfanalytics-job}.
- end::data-frame-analytics-stats[]
- tag::datafeed-id[]
- A numerical character string that uniquely identifies the
- {dfeed}. This identifier can contain lowercase alphanumeric characters (a-z
- and 0-9), hyphens, and underscores. It must start and end with alphanumeric
- characters.
- end::datafeed-id[]
- tag::datafeed-id-wildcard[]
- Identifier for the {dfeed}. It can be a {dfeed} identifier or a wildcard
- expression.
- end::datafeed-id-wildcard[]
- tag::dead-category-count[]
- The number of categories created by categorization that will never be assigned
- again because another category's definition makes it a superset of the dead
- category. (Dead categories are a side effect of the way categorization has no
- prior training.)
- end::dead-category-count[]
- tag::delayed-data-check-config[]
- Specifies whether the {dfeed} checks for missing data and the size of the
- window. For example: `{"enabled": true, "check_window": "1h"}`.
- +
- The {dfeed} can optionally search over indices that have already been read in
- an effort to determine whether any data has subsequently been added to the
- index. If missing data is found, it is a good indication that the `query_delay`
- option is set too low and the data is being indexed after the {dfeed} has passed
- that moment in time. See
- {ml-docs}/ml-delayed-data-detection.html[Working with delayed data].
- +
- This check runs only on real-time {dfeeds}.
- +
- .Properties of `delayed_data_check_config`
- [%collapsible%open]
- ====
- `check_window`::
- (<<time-units,time units>>) The window of time that is searched for late data.
- This window of time ends with the latest finalized bucket. It defaults to
- `null`, which causes an appropriate `check_window` to be calculated when the
- real-time {dfeed} runs. In particular, the default `check_window` span
- calculation is based on the maximum of `2h` or `8 * bucket_span`.
- `enabled`::
- (boolean) Specifies whether the {dfeed} periodically checks for delayed data.
- Defaults to `true`.
- ====
- end::delayed-data-check-config[]
- tag::dependent-variable[]
- Defines which field of the document is to be predicted.
- This parameter is supplied by field name and must match one of the fields in
- the index being used to train. If this field is missing from a document, then
- that document will not be used for training, but a prediction with the trained
- model will be generated for it. It is also known as continuous target variable.
- end::dependent-variable[]
- tag::desc-results[]
- If true, the results are sorted in descending order.
- end::desc-results[]
- tag::description-dfa[]
- A description of the job.
- end::description-dfa[]
- tag::dest[]
- The destination configuration, consisting of `index` and optionally
- `results_field` (`ml` by default).
- +
- .Properties of `dest`
- [%collapsible%open]
- ====
- `index`:::
- (Required, string) Defines the _destination index_ to store the results of the
- {dfanalytics-job}.
-
- `results_field`:::
- (Optional, string) Defines the name of the field in which to store the results
- of the analysis. Defaults to `ml`.
- ====
- end::dest[]
- tag::detector-description[]
- A description of the detector. For example, `Low event rate`.
- end::detector-description[]
- tag::detector-field-name[]
- The field that the detector uses in the function. If you use an event rate
- function such as `count` or `rare`, do not specify this field.
- +
- --
- NOTE: The `field_name` cannot contain double quotes or backslashes.
- --
- end::detector-field-name[]
- tag::detector-index[]
- A unique identifier for the detector. This identifier is based on the order of
- the detectors in the `analysis_config`, starting at zero.
- end::detector-index[]
- tag::dfas-alpha[]
- Regularization factor to penalize deeper trees when training decision trees.
- end::dfas-alpha[]
- tag::dfas-downsample-factor[]
- The value of the downsample factor.
- end::dfas-downsample-factor[]
- tag::dfas-eta[]
- The value of the eta hyperparameter.
- end::dfas-eta[]
- tag::dfas-eta-growth[]
- Specifies the rate at which the `eta` increases for each new tree that is added
- to the forest. For example, a rate of `1.05` increases `eta` by 5%.
- end::dfas-eta-growth[]
- tag::dfas-feature-bag-fraction[]
- The fraction of features that is used when selecting a random bag for each
- candidate split.
- end::dfas-feature-bag-fraction[]
- tag::dfas-gamma[]
- Regularization factor to penalize trees with large numbers of nodes.
- end::dfas-gamma[]
- tag::dfas-lambda[]
- Regularization factor to penalize large leaf weights.
- end::dfas-lambda[]
- tag::dfas-max-attempts[]
- If the algorithm fails to determine a non-trivial tree (more than a single
- leaf), this parameter determines how many of such consecutive failures are
- tolerated. Once the number of attempts exceeds the threshold, the forest
- training stops.
- end::dfas-max-attempts[]
- tag::dfas-max-optimization-rounds[]
- A multiplier responsible for determining the maximum number of
- hyperparameter optimization steps in the Bayesian optimization procedure.
- The maximum number of steps is determined based on the number of undefined
- hyperparameters times the maximum optimization rounds per hyperparameter.
- end::dfas-max-optimization-rounds[]
- tag::dfas-max-trees[]
- The maximum number of trees in the forest.
- end::dfas-max-trees[]
- tag::dfas-num-folds[]
- The maximum number of folds for the cross-validation procedure.
- end::dfas-num-folds[]
- tag::dfas-num-splits[]
- Determines the maximum number of splits for every feature that can occur in a
- decision tree when the tree is trained.
- end::dfas-num-splits[]
- tag::dfas-soft-limit[]
- Tree depth limit is used for calculating the tree depth penalty. This is a soft
- limit, it can be exceeded.
- end::dfas-soft-limit[]
- tag::dfas-soft-tolerance[]
- Tree depth tolerance is used for calculating the tree depth penalty. This is a
- soft limit, it can be exceeded.
- end::dfas-soft-tolerance[]
- tag::dfas-iteration[]
- The number of iterations on the analysis.
- end::dfas-iteration[]
- tag::dfas-timestamp[]
- The timestamp when the statistics were reported in milliseconds since the epoch.
- end::dfas-timestamp[]
- tag::dfas-timing-stats[]
- An object containing time statistics about the {dfanalytics-job}.
- end::dfas-timing-stats[]
- tag::dfas-timing-stats-elapsed[]
- Runtime of the analysis in milliseconds.
- end::dfas-timing-stats-elapsed[]
- tag::dfas-timing-stats-iteration[]
- Runtime of the latest iteration of the analysis in milliseconds.
- end::dfas-timing-stats-iteration[]
- tag::dfas-validation-loss[]
- An object containing information about validation loss.
- end::dfas-validation-loss[]
- tag::dfas-validation-loss-fold[]
- Validation loss values for every added decision tree during the forest growing
- procedure.
- end::dfas-validation-loss-fold[]
- tag::dfas-validation-loss-type[]
- The type of the loss metric. For example, `binomial_logistic`.
- end::dfas-validation-loss-type[]
- tag::earliest-record-timestamp[]
- The timestamp of the earliest chronologically input document.
- end::earliest-record-timestamp[]
- tag::empty-bucket-count[]
- The number of buckets which did not contain any data. If your data
- contains many empty buckets, consider increasing your `bucket_span` or using
- functions that are tolerant to gaps in data such as `mean`, `non_null_sum` or
- `non_zero_count`.
- end::empty-bucket-count[]
- tag::eta[]
- Advanced configuration option. The shrinkage applied to the weights. Smaller
- values result in larger forests which have a better generalization error. However,
- the smaller the value the longer the training will take. For more information
- about shrinkage, see
- https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article].
- end::eta[]
- tag::exclude-frequent[]
- Contains one of the following values: `all`, `none`, `by`, or `over`. If set,
- frequent entities are excluded from influencing the anomaly results. Entities
- can be considered frequent over time or frequent in a population. If you are
- working with both over and by fields, then you can set `exclude_frequent` to
- `all` for both fields, or to `by` or `over` for those specific fields.
- end::exclude-frequent[]
- tag::exclude-interim-results[]
- If `true`, the output excludes interim results. By default, interim results are
- included.
- end::exclude-interim-results[]
- tag::feature-bag-fraction[]
- Advanced configuration option. Defines the fraction of features that will be
- used when selecting a random bag for each candidate split.
- end::feature-bag-fraction[]
- tag::filter[]
- One or more <<analysis-tokenfilters,token filters>>. In addition to the built-in
- token filters, other plugins can provide more token filters. This property is
- optional. If it is not specified, no token filters are applied prior to
- categorization.
- end::filter[]
- tag::filter-id[]
- A string that uniquely identifies a filter.
- end::filter-id[]
- tag::forecast-total[]
- The number of individual forecasts currently available for the job. A value of
- `1` or more indicates that forecasts exist.
- end::forecast-total[]
- tag::frequency[]
- The interval at which scheduled queries are made while the {dfeed} runs in real
- time. The default value is either the bucket span for short bucket spans, or,
- for longer bucket spans, a sensible fraction of the bucket span. For example:
- `150s`. When `frequency` is shorter than the bucket span, interim results for
- the last (partial) bucket are written then eventually overwritten by the full
- bucket results. If the {dfeed} uses aggregations, this value must be divisible
- by the interval of the date histogram aggregation.
- end::frequency[]
- tag::frequent-category-count[]
- The number of categories that match more than 1% of categorized documents.
- end::frequent-category-count[]
- tag::from[]
- Skips the specified number of {dfanalytics-jobs}. The default value is `0`.
- end::from[]
- tag::function[]
- The analysis function that is used. For example, `count`, `rare`, `mean`, `min`,
- `max`, and `sum`. For more information, see
- {ml-docs}/ml-functions.html[Function reference].
- end::function[]
- tag::gamma[]
- Advanced configuration option. Regularization parameter to prevent overfitting
- on the training data set. Multiplies a linear penalty associated with the size of
- individual trees in the forest. The higher the value the more training will
- prefer smaller trees. The smaller this parameter the larger individual trees
- will be and the longer training will take.
- end::gamma[]
- tag::groups[]
- A list of job groups. A job can belong to no groups or many.
- end::groups[]
- tag::indices[]
- An array of index names. Wildcards are supported. For example:
- `["it_ops_metrics", "server*"]`.
- +
- --
- NOTE: If any indices are in remote clusters then `node.remote_cluster_client`
- must not be set to `false` on any {ml} nodes.
- --
- end::indices[]
- tag::indices-options[]
- Specifies index expansion options that are used during search.
- +
- --
- For example:
- ```
- {
- "expand_wildcards": ["all"],
- "ignore_unavailable": true,
- "allow_no_indices": "false",
- "ignore_throttled": true
- }
- ```
- For more information about these options, see <<multi-index>>.
- --
- end::indices-options[]
- tag::inference-config-classification-num-top-classes[]
- Specifies the number of top class predictions to return. Defaults to 0.
- end::inference-config-classification-num-top-classes[]
- tag::inference-config-classification-num-top-feature-importance-values[]
- Specifies the maximum number of
- {ml-docs}/dfa-classification.html#dfa-classification-feature-importance[feature
- importance] values per document. By default, it is zero and no feature
- importance calculation occurs.
- end::inference-config-classification-num-top-feature-importance-values[]
- tag::inference-config-classification-top-classes-results-field[]
- Specifies the field to which the top classes are written. Defaults to
- `top_classes`.
- end::inference-config-classification-top-classes-results-field[]
- tag::inference-config-regression-num-top-feature-importance-values[]
- Specifies the maximum number of
- {ml-docs}/dfa-regression.html#dfa-regression-feature-importance[feature
- importance] values per document. By default, it is zero and no feature importance
- calculation occurs.
- end::inference-config-regression-num-top-feature-importance-values[]
- tag::inference-config-results-field[]
- The field that is added to incoming documents to contain the inference
- prediction. Defaults to `predicted_value`.
- end::inference-config-results-field[]
- tag::influencers[]
- A comma separated list of influencer field names. Typically these can be the by,
- over, or partition fields that are used in the detector configuration. You might
- also want to use a field name that is not specifically named in a detector, but
- is available as part of the input data. When you use multiple detectors, the use
- of influencers is recommended as it aggregates results for each influencer
- entity.
- end::influencers[]
- tag::input-bytes[]
- The number of bytes of input data posted to the {anomaly-job}.
- end::input-bytes[]
- tag::input-field-count[]
- The total number of fields in input documents posted to the {anomaly-job}. This
- count includes fields that are not used in the analysis. However, be aware that
- if you are using a {dfeed}, it extracts only the required fields from the
- documents it retrieves before posting them to the job.
- end::input-field-count[]
- tag::input-record-count[]
- The number of input documents posted to the {anomaly-job}.
- end::input-record-count[]
- tag::invalid-date-count[]
- The number of input documents with either a missing date field or a date that
- could not be parsed.
- end::invalid-date-count[]
- tag::is-interim[]
- If `true`, this is an interim result. In other words, the results are calculated
- based on partial input data.
- end::is-interim[]
- tag::job-id-anomaly-detection[]
- Identifier for the {anomaly-job}.
- end::job-id-anomaly-detection[]
- tag::job-id-data-frame-analytics[]
- Identifier for the {dfanalytics-job}.
- end::job-id-data-frame-analytics[]
- tag::job-id-anomaly-detection-default[]
- Identifier for the {anomaly-job}. It can be a job identifier, a group name, or a
- wildcard expression. If you do not specify one of these options, the API returns
- information for all {anomaly-jobs}.
- end::job-id-anomaly-detection-default[]
- tag::job-id-data-frame-analytics-default[]
- Identifier for the {dfanalytics-job}. If you do not specify this option, the API
- returns information for the first hundred {dfanalytics-jobs}.
- end::job-id-data-frame-analytics-default[]
- tag::job-id-anomaly-detection-list[]
- An identifier for the {anomaly-jobs}. It can be a job
- identifier, a group name, or a comma-separated list of jobs or groups.
- end::job-id-anomaly-detection-list[]
- tag::job-id-anomaly-detection-wildcard[]
- Identifier for the {anomaly-job}. It can be a job identifier, a group name, or a
- wildcard expression.
- end::job-id-anomaly-detection-wildcard[]
- tag::job-id-anomaly-detection-wildcard-list[]
- Identifier for the {anomaly-job}. It can be a job identifier, a group name, a
- comma-separated list of jobs or groups, or a wildcard expression.
- end::job-id-anomaly-detection-wildcard-list[]
- tag::job-id-anomaly-detection-define[]
- Identifier for the {anomaly-job}. This identifier can contain lowercase
- alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start
- and end with alphanumeric characters.
- end::job-id-anomaly-detection-define[]
- tag::job-id-data-frame-analytics-define[]
- Identifier for the {dfanalytics-job}. This identifier can contain lowercase
- alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start
- and end with alphanumeric characters.
- end::job-id-data-frame-analytics-define[]
- tag::job-id-datafeed[]
- The unique identifier for the job to which the {dfeed} sends data.
- end::job-id-datafeed[]
- tag::lambda[]
- Advanced configuration option. Regularization parameter to prevent overfitting
- on the training data set. Multiplies an L2 regularisation term which applies to
- leaf weights of the individual trees in the forest. The higher the value the
- more training will attempt to keep leaf weights small. This makes the prediction
- function smoother at the expense of potentially not being able to capture
- relevant relationships between the features and the {depvar}. The smaller this
- parameter the larger individual trees will be and the longer training will take.
- end::lambda[]
- tag::last-data-time[]
- The timestamp at which data was last analyzed, according to server time.
- end::last-data-time[]
- tag::latency[]
- The size of the window in which to expect data that is out of time order. The
- default value is 0 (no latency). If you specify a non-zero value, it must be
- greater than or equal to one second. For more information about time units, see
- <<time-units>>.
- +
- --
- NOTE: Latency is only applicable when you send data by using
- the <<ml-post-data,post data>> API.
- --
- end::latency[]
- tag::latest-empty-bucket-timestamp[]
- The timestamp of the last bucket that did not contain any data.
- end::latest-empty-bucket-timestamp[]
- tag::latest-record-timestamp[]
- The timestamp of the latest chronologically input document.
- end::latest-record-timestamp[]
- tag::latest-sparse-record-timestamp[]
- The timestamp of the last bucket that was considered sparse.
- end::latest-sparse-record-timestamp[]
- tag::max-empty-searches[]
- If a real-time {dfeed} has never seen any data (including during any initial
- training period) then it will automatically stop itself and close its associated
- job after this many real-time searches that return no documents. In other words,
- it will stop after `frequency` times `max_empty_searches` of real-time
- operation. If not set then a {dfeed} with no end time that sees no data will
- remain started until it is explicitly stopped. By default this setting is not
- set.
- end::max-empty-searches[]
- tag::max-trees[]
- Advanced configuration option. Defines the maximum number of trees the forest is
- allowed to contain. The maximum value is 2000.
- end::max-trees[]
- tag::missing-field-count[]
- The number of input documents that are missing a field that the {anomaly-job} is
- configured to analyze. Input documents with missing fields are still processed
- because it is possible that not all fields are missing.
- +
- --
- NOTE: If you are using {dfeeds} or posting data to the job in JSON format, a
- high `missing_field_count` is often not an indication of data issues. It is not
- necessarily a cause for concern.
- --
- end::missing-field-count[]
- tag::mode[]
- There are three available modes:
- +
- --
- * `auto`: The chunk size is dynamically calculated. This is the default and
- recommended value.
- * `manual`: Chunking is applied according to the specified `time_span`.
- * `off`: No chunking is applied.
- --
- end::mode[]
- tag::model-bytes[]
- The number of bytes of memory used by the models. This is the maximum value
- since the last time the model was persisted. If the job is closed, this value
- indicates the latest size.
- end::model-bytes[]
- tag::model-bytes-exceeded[]
- The number of bytes over the high limit for memory usage at the last allocation
- failure.
- end::model-bytes-exceeded[]
- tag::model-id[]
- The unique identifier of the trained {infer} model.
- end::model-id[]
- tag::model-memory-limit[]
- The approximate maximum amount of memory resources that are required for
- analytical processing. Once this limit is approached, data pruning becomes
- more aggressive. Upon exceeding this limit, new entities are not modeled. The
- default value for jobs created in version 6.1 and later is `1024mb`.
- This value will need to be increased for jobs that are expected to analyze high
- cardinality fields, but the default is set to a relatively small size to ensure
- that high resource usage is a conscious decision. The default value for jobs
- created in versions earlier than 6.1 is `4096mb`.
- +
- If you specify a number instead of a string, the units are assumed to be MiB.
- Specifying a string is recommended for clarity. If you specify a byte size unit
- of `b` or `kb` and the number does not equate to a discrete number of megabytes,
- it is rounded down to the closest MiB. The minimum valid value is 1 MiB. If you
- specify a value less than 1 MiB, an error occurs. For more information about
- supported byte size units, see <<byte-units>>.
- +
- If your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit`
- setting, an error occurs when you try to create jobs that have
- `model_memory_limit` values greater than that setting. For more information,
- see <<ml-settings>>.
- end::model-memory-limit[]
- tag::model-memory-limit-anomaly-jobs[]
- The upper limit for model memory usage, checked on increasing values.
- end::model-memory-limit-anomaly-jobs[]
- tag::model-memory-status[]
- The status of the mathematical models, which can have one of the following
- values:
- +
- --
- * `ok`: The models stayed below the configured value.
- * `soft_limit`: The models used more than 60% of the configured memory limit
- and older unused models will be pruned to free up space.
- * `hard_limit`: The models used more space than the configured memory limit.
- As a result, not all incoming data was processed.
- --
- end::model-memory-status[]
- tag::model-plot-config[]
- This advanced configuration option stores model information along with the
- results. It provides a more detailed view into {anomaly-detect}.
- +
- --
- WARNING: If you enable model plot it can add considerable overhead to the
- performance of the system; it is not feasible for jobs with many entities.
- Model plot provides a simplified and indicative view of the model and its
- bounds. It does not display complex features such as multivariate correlations
- or multimodal data. As such, anomalies may occasionally be reported which cannot
- be seen in the model plot.
- Model plot config can be configured when the job is created or updated later. It
- must be disabled if performance issues are experienced.
- --
- end::model-plot-config[]
- tag::model-plot-config-enabled[]
- If true, enables calculation and storage of the model bounds for each entity
- that is being analyzed. By default, this is not enabled.
- end::model-plot-config-enabled[]
- tag::model-plot-config-terms[]
- Limits data collection to this comma separated list of partition or by field
- values. If terms are not specified or it is an empty string, no filtering is
- applied. For example, "CPU,NetworkIn,DiskWrites". Wildcards are not supported.
- Only the specified `terms` can be viewed when using the Single Metric Viewer.
- end::model-plot-config-terms[]
- tag::model-snapshot-id[]
- A numerical character string that uniquely identifies the model snapshot. For
- example, `1575402236000 `.
- end::model-snapshot-id[]
- tag::model-snapshot-retention-days[]
- Advanced configuration option. The period of time (in days) that model snapshots
- are retained. Age is calculated relative to the timestamp of the newest model
- snapshot. The default value is `1`, which means snapshots that are one day
- (twenty-four hours) older than the newest snapshot are deleted.
- end::model-snapshot-retention-days[]
- tag::model-timestamp[]
- The timestamp of the last record when the model stats were gathered.
- end::model-timestamp[]
- tag::multivariate-by-fields[]
- This functionality is reserved for internal use. It is not supported for use in
- customer environments and is not subject to the support SLA of official GA
- features.
- +
- --
- If set to `true`, the analysis will automatically find correlations between
- metrics for a given `by` field value and report anomalies when those
- correlations cease to hold. For example, suppose CPU and memory usage on host A
- is usually highly correlated with the same metrics on host B. Perhaps this
- correlation occurs because they are running a load-balanced application.
- If you enable this property, then anomalies will be reported when, for example,
- CPU usage on host A is high and the value of CPU usage on host B is low. That
- is to say, you'll see an anomaly when the CPU of host A is unusual given
- the CPU of host B.
- NOTE: To use the `multivariate_by_fields` property, you must also specify
- `by_field_name` in your detector.
- --
- end::multivariate-by-fields[]
- tag::node-address[]
- The network address of the node.
- end::node-address[]
- tag::node-datafeeds[]
- For started {dfeeds} only, this information pertains to the node upon which the
- {dfeed} is started.
- end::node-datafeeds[]
- tag::node-ephemeral-id[]
- The ephemeral ID of the node.
- end::node-ephemeral-id[]
- tag::node-id[]
- The unique identifier of the node.
- end::node-id[]
- tag::node-jobs[]
- Contains properties for the node that runs the job. This information is
- available only for open jobs.
- end::node-jobs[]
- tag::open-time[]
- For open jobs only, the elapsed time for which the job has been open.
- end::open-time[]
- tag::out-of-order-timestamp-count[]
- The number of input documents that are out of time sequence and outside
- of the latency window. This information is applicable only when you provide data
- to the {anomaly-job} by using the <<ml-post-data,post data API>>. These out of
- order documents are discarded, since jobs require time series data to be in
- ascending chronological order.
- end::out-of-order-timestamp-count[]
- tag::over-field-name[]
- The field used to split the data. In particular, this property is used for
- analyzing the splits with respect to the history of all splits. It is used for
- finding unusual values in the population of all splits. For more information,
- see {ml-docs}/ml-configuring-pop.html[Performing population analysis].
- end::over-field-name[]
- tag::partition-field-name[]
- The field used to segment the analysis. When you use this property, you have
- completely independent baselines for each value of this field.
- end::partition-field-name[]
- tag::prediction-field-name[]
- Defines the name of the prediction field in the results.
- Defaults to `<dependent_variable>_prediction`.
- end::prediction-field-name[]
- tag::processed-field-count[]
- The total number of fields in all the documents that have been processed by the
- {anomaly-job}. Only fields that are specified in the detector configuration
- object contribute to this count. The timestamp is not included in this count.
- end::processed-field-count[]
- tag::processed-record-count[]
- The number of input documents that have been processed by the {anomaly-job}.
- This value includes documents with missing fields, since they are nonetheless
- analyzed. If you use {dfeeds} and have aggregations in your search query, the
- `processed_record_count` is the number of aggregation results processed, not the
- number of {es} documents.
- end::processed-record-count[]
- tag::randomize-seed[]
- Defines the seed to the random generator that is used to pick which documents
- will be used for training. By default it is randomly generated. Set it to a
- specific value to ensure the same documents are used for training assuming other
- related parameters (for example, `source`, `analyzed_fields`, etc.) are the
- same.
- end::randomize-seed[]
- tag::query[]
- The {es} query domain-specific language (DSL). This value corresponds to the
- query object in an {es} search POST body. All the options that are supported by
- {es} can be used, as this object is passed verbatim to {es}. By default, this
- property has the following value: `{"match_all": {"boost": 1}}`.
- end::query[]
- tag::query-delay[]
- The number of seconds behind real time that data is queried. For example, if
- data from 10:04 a.m. might not be searchable in {es} until 10:06 a.m., set this
- property to 120 seconds. The default value is randomly selected between `60s`
- and `120s`. This randomness improves the query performance when there are
- multiple jobs running on the same node. For more information, see
- {ml-docs}/ml-delayed-data-detection.html[Handling delayed data].
- end::query-delay[]
- tag::rare-category-count[]
- The number of categories that match just one categorized document.
- end::rare-category-count[]
- tag::renormalization-window-days[]
- Advanced configuration option. The period over which adjustments to the score
- are applied, as new data is seen. The default value is the longer of 30 days or
- 100 `bucket_spans`.
- end::renormalization-window-days[]
- tag::results-index-name[]
- A text string that affects the name of the {ml} results index. The default value
- is `shared`, which generates an index named `.ml-anomalies-shared`.
- end::results-index-name[]
- tag::results-retention-days[]
- Advanced configuration option. The period of time (in days) that results are
- retained. Age is calculated relative to the timestamp of the latest bucket
- result. If this property has a non-null value, once per day at 00:30 (server
- time), results that are the specified number of days older than the latest
- bucket result are deleted from {es}. The default value is null, which means all
- results are retained.
- end::results-retention-days[]
- tag::retain[]
- If `true`, this snapshot will not be deleted during automatic cleanup of
- snapshots older than `model_snapshot_retention_days`. However, this snapshot
- will be deleted when the job is deleted. The default value is `false`.
- end::retain[]
- tag::script-fields[]
- Specifies scripts that evaluate custom expressions and returns script fields to
- the {dfeed}. The detector configuration objects in a job can contain functions
- that use these script fields. For more information, see
- {ml-docs}/ml-configuring-transform.html[Transforming data with script fields]
- and <<request-body-search-script-fields,Script fields>>.
- end::script-fields[]
- tag::scroll-size[]
- The `size` parameter that is used in {es} searches. The default value is `1000`.
- end::scroll-size[]
- tag::search-bucket-avg[]
- The average search time per bucket, in milliseconds.
- end::search-bucket-avg[]
- tag::search-count[]
- The number of searches run by the {dfeed}.
- end::search-count[]
- tag::search-exp-avg-hour[]
- The exponential average search time per hour, in milliseconds.
- end::search-exp-avg-hour[]
- tag::search-time[]
- The total time the {dfeed} spent searching, in milliseconds.
- end::search-time[]
- tag::size[]
- Specifies the maximum number of {dfanalytics-jobs} to obtain. The default value
- is `100`.
- end::size[]
- tag::snapshot-id[]
- Identifier for the model snapshot.
- end::snapshot-id[]
- tag::sparse-bucket-count[]
- The number of buckets that contained few data points compared to the expected
- number of data points. If your data contains many sparse buckets, consider using
- a longer `bucket_span`.
- end::sparse-bucket-count[]
- tag::state-anomaly-job[]
- The status of the {anomaly-job}, which can be one of the following values:
- +
- --
- * `closed`: The job finished successfully with its model state persisted. The
- job must be opened before it can accept further data.
- * `closing`: The job close action is in progress and has not yet completed. A
- closing job cannot accept further data.
- * `failed`: The job did not finish successfully due to an error. This situation
- can occur due to invalid input data, a fatal error occurring during the
- analysis, or an external interaction such as the process being killed by the
- Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be
- force closed and then deleted. If the {dfeed} can be corrected, the job can be
- closed and then re-opened.
- * `opened`: The job is available to receive and process data.
- * `opening`: The job open action is in progress and has not yet completed.
- --
- end::state-anomaly-job[]
- tag::state-datafeed[]
- The status of the {dfeed}, which can be one of the following values:
- +
- --
- * `starting`: The {dfeed} has been requested to start but has not yet started.
- * `started`: The {dfeed} is actively receiving data.
- * `stopping`: The {dfeed} has been requested to stop gracefully and is
- completing its final action.
- * `stopped`: The {dfeed} is stopped and will not receive data until it is
- re-started.
- --
- end::state-datafeed[]
- tag::summary-count-field-name[]
- If this property is specified, the data that is fed to the job is expected to be
- pre-summarized. This property value is the name of the field that contains the
- count of raw data points that have been summarized. The same
- `summary_count_field_name` applies to all detectors in the job.
- +
- --
- NOTE: The `summary_count_field_name` property cannot be used with the `metric`
- function.
- --
- end::summary-count-field-name[]
- tag::tags[]
- A comma delimited string of tags. A {infer} model can have many tags, or none.
- When supplied, only {infer} models that contain all the supplied tags are
- returned.
- end::tags[]
- tag::timeout-start[]
- Controls the amount of time to wait until the {dfanalytics-job} starts. Defaults
- to 20 seconds.
- end::timeout-start[]
- tag::timeout-stop[]
- Controls the amount of time to wait until the {dfanalytics-job} stops. Defaults
- to 20 seconds.
- end::timeout-stop[]
- tag::time-format[]
- The time format, which can be `epoch`, `epoch_ms`, or a custom pattern. The
- default value is `epoch`, which refers to UNIX or Epoch time (the number of
- seconds since 1 Jan 1970). The value `epoch_ms` indicates that time is measured
- in milliseconds since the epoch. The `epoch` and `epoch_ms` time formats accept
- either integer or real values. +
- +
- NOTE: Custom patterns must conform to the Java `DateTimeFormatter` class.
- When you use date-time formatting patterns, it is recommended that you provide
- the full date, time and time zone. For example: `yyyy-MM-dd'T'HH:mm:ssX`.
- If the pattern that you specify is not sufficient to produce a complete
- timestamp, job creation fails.
- end::time-format[]
- tag::time-span[]
- The time span that each search will be querying. This setting is only applicable
- when the mode is set to `manual`. For example: `3h`.
- end::time-span[]
- tag::timestamp-results[]
- The start time of the bucket for which these results were calculated.
- end::timestamp-results[]
- tag::tokenizer[]
- The name or definition of the <<analysis-tokenizers,tokenizer>> to use after
- character filters are applied. This property is compulsory if
- `categorization_analyzer` is specified as an object. Machine learning provides a
- tokenizer called `ml_classic` that tokenizes in the same way as the
- non-customizable tokenizer in older versions of the product. If you want to use
- that tokenizer but change the character or token filters, specify
- `"tokenizer": "ml_classic"` in your `categorization_analyzer`.
- end::tokenizer[]
- tag::total-by-field-count[]
- The number of `by` field values that were analyzed by the models. This value is
- cumulative for all detectors in the job.
- end::total-by-field-count[]
- tag::total-category-count[]
- The number of categories created by categorization.
- end::total-category-count[]
- tag::total-over-field-count[]
- The number of `over` field values that were analyzed by the models. This value
- is cumulative for all detectors in the job.
- end::total-over-field-count[]
- tag::total-partition-field-count[]
- The number of `partition` field values that were analyzed by the models. This
- value is cumulative for all detectors in the job.
- end::total-partition-field-count[]
- tag::training-percent[]
- Defines what percentage of the eligible documents that will
- be used for training. Documents that are ignored by the analysis (for example
- those that contain arrays with more than one value) won’t be included in the
- calculation for used percentage. Defaults to `100`.
- end::training-percent[]
- tag::use-null[]
- Defines whether a new series is used as the null series when there is no value
- for the by or partition fields. The default value is `false`.
- end::use-null[]
|