123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922 |
- [role="xpack"]
- [[put-dfanalytics]]
- = Create {dfanalytics-jobs} API
- [subs="attributes"]
- ++++
- <titleabbrev>Create {dfanalytics-jobs}</titleabbrev>
- ++++
- Instantiates a {dfanalytics-job}.
- [[ml-put-dfanalytics-request]]
- == {api-request-title}
- `PUT _ml/data_frame/analytics/<data_frame_analytics_id>`
- [[ml-put-dfanalytics-prereq]]
- == {api-prereq-title}
- Requires the following privileges:
- * cluster: `manage_ml` (the `machine_learning_admin` built-in role grants this
- privilege)
- * source indices: `read`, `view_index_metadata`
- * destination index: `read`, `create_index`, `manage` and `index`
- NOTE: The {dfanalytics-job} remembers which roles the user who created it had at
- the time of creation. When you start the job, it performs the analysis using
- those same roles. If you provide
- <<http-clients-secondary-authorization,secondary authorization headers>>,
- those credentials are used instead.
- [[ml-put-dfanalytics-desc]]
- == {api-description-title}
- This API creates a {dfanalytics-job} that performs an analysis on the source
- indices and stores the outcome in a destination index.
- If the destination index does not exist, it is created automatically when you
- start the job. See <<start-dfanalytics>>.
- If you supply only a subset of the {regression} or {classification} parameters,
- {ml-docs}/hyperparameters.html[hyperparameter optimization] occurs. It
- determines a value for each of the undefined parameters.
- [[ml-put-dfanalytics-path-params]]
- == {api-path-parms-title}
- `<data_frame_analytics_id>`::
- (Required, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
- [role="child_attributes"]
- [[ml-put-dfanalytics-request-body]]
- == {api-request-body-title}
- `allow_lazy_start`::
- (Optional, Boolean)
- Specifies whether this job can start when there is insufficient {ml} node
- capacity for it to be immediately assigned to a node. The default is `false`; if
- a {ml} node with capacity to run the job cannot immediately be found, the API
- returns an error. However, this is also subject to the cluster-wide
- `xpack.ml.max_lazy_ml_nodes` setting. See <<advanced-ml-settings>>. If this
- option is set to `true`, the API does not return an error and the job waits in
- the `starting` state until sufficient {ml} node capacity is available.
- //Begin analysis
- `analysis`::
- (Required, object)
- The analysis configuration, which contains the information necessary to perform
- one of the following types of analysis: {classification}, {oldetection}, or
- {regression}.
- +
- .Properties of `analysis`
- [%collapsible%open]
- ====
- //Begin classification
- `classification`:::
- (Required^*^, object)
- The configuration information necessary to perform
- {ml-docs}/ml-dfa-classification.html[{classification}].
- +
- TIP: Advanced parameters are for fine-tuning {classanalysis}. They are set
- automatically by hyperparameter optimization to give the minimum validation
- error. It is highly recommended to use the default values unless you fully
- understand the function of these parameters.
- +
- .Properties of `classification`
- [%collapsible%open]
- =====
- `alpha`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-alpha]
- `class_assignment_objective`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=class-assignment-objective]
- `dependent_variable`::::
- (Required, string)
- +
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dependent-variable]
- +
- The data type of the field must be numeric (`integer`, `short`, `long`, `byte`),
- categorical (`ip` or `keyword`), or boolean. There must be no more than 30
- different values in this field.
- `downsample_factor`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-downsample-factor]
- `early_stopping_enabled`::::
- (Optional, Boolean)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-early-stopping-enabled]
- `eta`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=eta]
- `eta_growth_rate_per_tree`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-eta-growth]
- `feature_bag_fraction`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
- `feature_processors`::::
- (Optional, list)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors]
- +
- .Properties of `feature_processors`
- [%collapsible%open]
- ======
- `frequency_encoding`::::
- (object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-frequency]
- +
- .Properties of `frequency_encoding`
- [%collapsible%open]
- =======
- `feature_name`::::
- (Required, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-feat-name]
- `field`::::
- (Required, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]
- `frequency_map`::::
- (Required, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-frequency-map]
- =======
- `multi_encoding`::::
- (object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-multi]
- +
- .Properties of `multi_encoding`
- [%collapsible%open]
- =======
- `processors`::::
- (Required, array)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-multi-proc]
- =======
- `n_gram_encoding`::::
- (object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram]
- +
- .Properties of `n_gram_encoding`
- [%collapsible%open]
- =======
- `feature_prefix`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-feat-pref]
- `field`::::
- (Required, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-field]
- `length`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-length]
- `n_grams`::::
- (Required, array)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-ngrams]
- `start`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-start]
- =======
- `one_hot_encoding`::::
- (object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-one-hot]
- +
- .Properties of `one_hot_encoding`
- [%collapsible%open]
- =======
- `field`::::
- (Required, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]
- `hot_map`::::
- (Required, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-one-hot-map]
- =======
- `target_mean_encoding`::::
- (object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean]
- +
- .Properties of `target_mean_encoding`
- [%collapsible%open]
- =======
- `default_value`::::
- (Required, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean-default]
- `feature_name`::::
- (Required, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-feat-name]
- `field`::::
- (Required, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]
- `target_map`::::
- (Required, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean-map]
- =======
- ======
- `gamma`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=gamma]
- `lambda`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=lambda]
- `max_optimization_rounds_per_hyperparameter`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-max-optimization-rounds]
- `max_trees`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=max-trees]
- `num_top_classes`::::
- (Optional, integer)
- Defines the number of categories for which the predicted probabilities are
- reported. It must be non-negative or -1. If it is -1 or greater than the total
- number of categories, probabilities are reported for all categories; if you have
- a large number of categories, there could be a significant effect on the size of your destination index. Defaults to 2.
- +
- --
- NOTE: To use the
- {ml-docs}/ml-dfa-classification.html#ml-dfanalytics-class-aucroc[AUC ROC]
- evaluation method, `num_top_classes` must be set to `-1` or a value greater than
- or equal to the total number of categories.
- --
- `num_top_feature_importance_values`::::
- (Optional, integer)
- Advanced configuration option. Specifies the maximum number of
- {ml-docs}/ml-feature-importance.html[{feat-imp}] values per document to return.
- By default, it is zero and no {feat-imp} calculation occurs.
- `prediction_field_name`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
- `randomize_seed`::::
- (Optional, long)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=randomize-seed]
- `soft_tree_depth_limit`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-soft-limit]
- `soft_tree_depth_tolerance`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-soft-tolerance]
- `training_percent`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=training-percent]
- //End classification
- =====
- //Begin outlier_detection
- `outlier_detection`:::
- (Required^*^, object)
- The configuration information necessary to perform
- {ml-docs}/ml-dfa-finding-outliers.html[{oldetection}]:
- +
- .Properties of `outlier_detection`
- [%collapsible%open]
- =====
- `compute_feature_influence`::::
- (Optional, Boolean)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
- `feature_influence_threshold`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
- `method`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=method]
- `n_neighbors`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=n-neighbors]
- `outlier_fraction`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
- `standardization_enabled`::::
- (Optional, Boolean)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
- //End outlier_detection
- =====
- //Begin regression
- `regression`:::
- (Required^*^, object)
- The configuration information necessary to perform
- {ml-docs}/ml-dfa-regression.html[{regression}].
- +
- TIP: Advanced parameters are for fine-tuning {reganalysis}. They are set
- automatically by hyperparameter optimization to give the minimum validation
- error. It is highly recommended to use the default values unless you fully
- understand the function of these parameters.
- +
- .Properties of `regression`
- [%collapsible%open]
- =====
- `alpha`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-alpha]
- `dependent_variable`::::
- (Required, string)
- +
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dependent-variable]
- +
- The data type of the field must be numeric.
- `downsample_factor`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-downsample-factor]
- `early_stopping_enabled`::::
- (Optional, Boolean)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-early-stopping-enabled]
- `eta`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=eta]
- `eta_growth_rate_per_tree`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-eta-growth]
- `feature_bag_fraction`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
- `feature_processors`::::
- (Optional, list)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors]
- `gamma`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=gamma]
- `lambda`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=lambda]
- `loss_function`::::
- (Optional, string)
- The loss function used during {regression}. Available options are `mse` (mean
- squared error), `msle` (mean squared logarithmic error), `huber` (Pseudo-Huber
- loss). Defaults to `mse`.
- Refer to
- {ml-docs}/dfa-regression-lossfunction.html[Loss functions for {regression} analyses]
- to learn more.
- `loss_function_parameter`::::
- (Optional, double)
- A positive number that is used as a parameter to the `loss_function`.
- `max_optimization_rounds_per_hyperparameter`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-max-optimization-rounds]
- `max_trees`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=max-trees]
- `num_top_feature_importance_values`::::
- (Optional, integer)
- Advanced configuration option. Specifies the maximum number of
- {ml-docs}/ml-feature-importance.html[{feat-imp}] values per document to return.
- By default, it is zero and no {feat-imp} calculation occurs.
- `prediction_field_name`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
- `randomize_seed`::::
- (Optional, long)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=randomize-seed]
- `soft_tree_depth_limit`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-soft-limit]
- `soft_tree_depth_tolerance`::::
- (Optional, double)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-soft-tolerance]
- `training_percent`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=training-percent]
- =====
- //End regression
- ====
- //End analysis
- //Begin analyzed_fields
- `analyzed_fields`::
- (Optional, object)
- Specify `includes` and/or `excludes` patterns to select which fields will be
- included in the analysis. The patterns specified in `excludes` are applied last,
- therefore `excludes` takes precedence. In other words, if the same field is
- specified in both `includes` and `excludes`, then the field will not be included
- in the analysis.
- +
- --
- [[dfa-supported-fields]]
- The supported fields for each type of analysis are as follows:
- * {oldetection-cap} requires numeric or boolean data to analyze. The algorithms
- don't support missing values therefore fields that have data types other than
- numeric or boolean are ignored. Documents where included fields contain missing
- values, null values, or an array are also ignored. Therefore the `dest` index
- may contain documents that don't have an {olscore}.
- * {regression-cap} supports fields that are numeric, `boolean`, `text`,
- `keyword`, and `ip`. It is also tolerant of missing values. Fields that are
- supported are included in the analysis, other fields are ignored. Documents
- where included fields contain an array with two or more values are also
- ignored. Documents in the `dest` index that don’t contain a results field are
- not included in the {reganalysis}.
- * {classification-cap} supports fields that are numeric, `boolean`, `text`,
- `keyword`, and `ip`. It is also tolerant of missing values. Fields that are
- supported are included in the analysis, other fields are ignored. Documents
- where included fields contain an array with two or more values are also ignored.
- Documents in the `dest` index that don’t contain a results field are not
- included in the {classanalysis}. {classanalysis-cap} can be improved by mapping
- ordinal variable values to a single number. For example, in case of age ranges,
- you can model the values as "0-14" = 0, "15-24" = 1, "25-34" = 2, and so on.
- If `analyzed_fields` is not set, only the relevant fields will be included. For
- example, all the numeric fields for {oldetection}. For more information about
- field selection, see <<explain-dfanalytics>>.
- --
- +
- .Properties of `analyzed_fields`
- [%collapsible%open]
- ====
- `excludes`:::
- (Optional, array)
- An array of strings that defines the fields that will be excluded from the
- analysis. You do not need to add fields with unsupported data types to
- `excludes`, these fields are excluded from the analysis automatically.
- `includes`:::
- (Optional, array)
- An array of strings that defines the fields that will be included in the
- analysis.
- //End analyzed_fields
- ====
- `description`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=description-dfa]
- `dest`::
- (Required, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dest]
- `max_num_threads`::
- (Optional, integer)
- The maximum number of threads to be used by the analysis.
- The default value is `1`. Using more threads may decrease the time
- necessary to complete the analysis at the cost of using more CPU.
- Note that the process may use additional threads for operational
- functionality other than the analysis itself.
- `model_memory_limit`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-memory-limit-dfa]
- `source`::
- (object)
- The configuration of how to source the analysis data. It requires an `index`.
- Optionally, `query`, `runtime_mappings`, and `_source` may be specified.
- +
- .Properties of `source`
- [%collapsible%open]
- ====
- `index`:::
- (Required, string or array) Index or indices on which to perform the analysis.
- It can be a single index or index pattern as well as an array of indices or
- patterns.
- +
- WARNING: If your source indices contain documents with the same IDs, only the
- document that is indexed last appears in the destination index.
- `query`:::
- (Optional, object) The {es} query domain-specific language (<<query-dsl,DSL>>).
- This value corresponds to the query object in an {es} search POST body. All the
- options that are supported by {es} can be used, as this object is passed
- verbatim to {es}. By default, this property has the following value:
- `{"match_all": {}}`.
- `runtime_mappings`:::
- (Optional, object) Definitions of runtime fields that will become part of the
- mapping of the destination index.
- `_source`:::
- (Optional, object) Specify `includes` and/or `excludes` patterns to select which
- fields will be present in the destination. Fields that are excluded cannot be
- included in the analysis.
- +
- .Properties of `_source`
- [%collapsible%open]
- =====
- `includes`::::
- (array) An array of strings that defines the fields that will be included in the
- destination.
- `excludes`::::
- (array) An array of strings that defines the fields that will be excluded from
- the destination.
- =====
- ====
- [[ml-put-dfanalytics-example]]
- == {api-examples-title}
- [[ml-put-dfanalytics-example-preprocess]]
- === Preprocessing actions example
- The following example shows how to limit the scope of the analysis to certain
- fields, specify excluded fields in the destination index, and use a query to
- filter your data before analysis.
- [source,console]
- --------------------------------------------------
- PUT _ml/data_frame/analytics/model-flight-delays-pre
- {
- "source": {
- "index": [
- "kibana_sample_data_flights" <1>
- ],
- "query": { <2>
- "range": {
- "DistanceKilometers": {
- "gt": 0
- }
- }
- },
- "_source": { <3>
- "includes": [],
- "excludes": [
- "FlightDelay",
- "FlightDelayType"
- ]
- }
- },
- "dest": { <4>
- "index": "df-flight-delays",
- "results_field": "ml-results"
- },
- "analysis": {
- "regression": {
- "dependent_variable": "FlightDelayMin",
- "training_percent": 90
- }
- },
- "analyzed_fields": { <5>
- "includes": [],
- "excludes": [
- "FlightNum"
- ]
- },
- "model_memory_limit": "100mb"
- }
- --------------------------------------------------
- // TEST[skip:setup kibana sample data]
- <1> Source index to analyze.
- <2> This query filters out entire documents that will not be present in the
- destination index.
- <3> The `_source` object defines fields in the data set that will be included or
- excluded in the destination index.
- <4> Defines the destination index that contains the results of the analysis and
- the fields of the source index specified in the `_source` object. Also defines
- the name of the `results_field`.
- <5> Specifies fields to be included in or excluded from the analysis. This does
- not affect whether the fields will be present in the destination index, only
- affects whether they are used in the analysis.
- In this example, we can see that all the fields of the source index are included
- in the destination index except `FlightDelay` and `FlightDelayType` because
- these are defined as excluded fields by the `excludes` parameter of the
- `_source` object. The `FlightNum` field is included in the destination index,
- however it is not included in the analysis because it is explicitly specified as
- excluded field by the `excludes` parameter of the `analyzed_fields` object.
- [[ml-put-dfanalytics-example-od]]
- === {oldetection-cap} example
- The following example creates the `loganalytics` {dfanalytics-job}, the analysis
- type is `outlier_detection`:
- [source,console]
- --------------------------------------------------
- PUT _ml/data_frame/analytics/loganalytics
- {
- "description": "Outlier detection on log data",
- "source": {
- "index": "logdata"
- },
- "dest": {
- "index": "logdata_out"
- },
- "analysis": {
- "outlier_detection": {
- "compute_feature_influence": true,
- "outlier_fraction": 0.05,
- "standardization_enabled": true
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:setup_logdata]
- The API returns the following result:
- [source,console-result]
- ----
- {
- "id" : "loganalytics",
- "create_time" : 1656364565517,
- "version" : "8.4.0",
- "authorization" : {
- "roles" : [
- "superuser"
- ]
- },
- "description" : "Outlier detection on log data",
- "source" : {
- "index" : [
- "logdata"
- ],
- "query" : {
- "match_all" : { }
- }
- },
- "dest" : {
- "index" : "logdata_out",
- "results_field" : "ml"
- },
- "analysis" : {
- "outlier_detection" : {
- "compute_feature_influence" : true,
- "outlier_fraction" : 0.05,
- "standardization_enabled" : true
- }
- },
- "model_memory_limit" : "1gb",
- "allow_lazy_start" : false,
- "max_num_threads" : 1
- }
- ----
- // TESTRESPONSE[s/1656364565517/$body.$_path/]
- // TESTRESPONSE[s/"version": "8.4.0"/"version": $body.version/]
- // TESTRESPONSE[s/"authorization" : \{[^}]*\},//]
- [[ml-put-dfanalytics-example-r]]
- === {regression-cap} examples
- The following example creates the `house_price_regression_analysis`
- {dfanalytics-job}, the analysis type is `regression`:
- [source,console]
- --------------------------------------------------
- PUT _ml/data_frame/analytics/house_price_regression_analysis
- {
- "source": {
- "index": "houses_sold_last_10_yrs"
- },
- "dest": {
- "index": "house_price_predictions"
- },
- "analysis":
- {
- "regression": {
- "dependent_variable": "price"
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:setup_houseprices]
- The API returns the following result:
- [source,console-result]
- ----
- {
- "id" : "house_price_regression_analysis",
- "create_time" : 1656364845151,
- "version" : "8.4.0",
- "authorization" : {
- "roles" : [
- "superuser"
- ]
- },
- "source" : {
- "index" : [
- "houses_sold_last_10_yrs"
- ],
- "query" : {
- "match_all" : { }
- }
- },
- "dest" : {
- "index" : "house_price_predictions",
- "results_field" : "ml"
- },
- "analysis" : {
- "regression" : {
- "dependent_variable" : "price",
- "prediction_field_name" : "price_prediction",
- "training_percent" : 100.0,
- "randomize_seed" : -3578554885299300212,
- "loss_function" : "mse",
- "early_stopping_enabled" : true
- }
- },
- "model_memory_limit" : "1gb",
- "allow_lazy_start" : false,
- "max_num_threads" : 1
- }
- ----
- // TESTRESPONSE[s/1656364845151/$body.$_path/]
- // TESTRESPONSE[s/"version": "8.4.0"/"version": $body.version/]
- // TESTRESPONSE[s/"authorization" : \{[^}]*\},//]
- // TESTRESPONSE[s/-3578554885299300212/$body.$_path/]
- The following example creates a job and specifies a training percent:
- [source,console]
- --------------------------------------------------
- PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
- {
- "source": {
- "index": "student_performance_mathematics"
- },
- "dest": {
- "index":"student_performance_mathematics_reg"
- },
- "analysis":
- {
- "regression": {
- "dependent_variable": "G3",
- "training_percent": 70, <1>
- "randomize_seed": 19673948271 <2>
- }
- }
- }
- --------------------------------------------------
- // TEST[skip:TBD]
- <1> The percentage of the data set that is used for training the model.
- <2> The seed that is used to randomly pick which data is used for training.
- The following example uses custom feature processors to transform the
- categorical values for `DestWeather` into numerical values using one-hot,
- target-mean, and frequency encoding techniques:
- [source,console]
- --------------------------------------------------
- PUT _ml/data_frame/analytics/flight_prices
- {
- "source": {
- "index": [
- "kibana_sample_data_flights"
- ]
- },
- "dest": {
- "index": "kibana_sample_flight_prices"
- },
- "analysis": {
- "regression": {
- "dependent_variable": "AvgTicketPrice",
- "num_top_feature_importance_values": 2,
- "feature_processors": [
- {
- "frequency_encoding": {
- "field": "DestWeather",
- "feature_name": "DestWeather_frequency",
- "frequency_map": {
- "Rain": 0.14604811155570188,
- "Heavy Fog": 0.14604811155570188,
- "Thunder & Lightning": 0.14604811155570188,
- "Cloudy": 0.14604811155570188,
- "Damaging Wind": 0.14604811155570188,
- "Hail": 0.14604811155570188,
- "Sunny": 0.14604811155570188,
- "Clear": 0.14604811155570188
- }
- }
- },
- {
- "target_mean_encoding": {
- "field": "DestWeather",
- "feature_name": "DestWeather_targetmean",
- "target_map": {
- "Rain": 626.5588814585794,
- "Heavy Fog": 626.5588814585794,
- "Thunder & Lightning": 626.5588814585794,
- "Hail": 626.5588814585794,
- "Damaging Wind": 626.5588814585794,
- "Cloudy": 626.5588814585794,
- "Clear": 626.5588814585794,
- "Sunny": 626.5588814585794
- },
- "default_value": 624.0249512020454
- }
- },
- {
- "one_hot_encoding": {
- "field": "DestWeather",
- "hot_map": {
- "Rain": "DestWeather_Rain",
- "Heavy Fog": "DestWeather_Heavy Fog",
- "Thunder & Lightning": "DestWeather_Thunder & Lightning",
- "Cloudy": "DestWeather_Cloudy",
- "Damaging Wind": "DestWeather_Damaging Wind",
- "Hail": "DestWeather_Hail",
- "Clear": "DestWeather_Clear",
- "Sunny": "DestWeather_Sunny"
- }
- }
- }
- ]
- }
- },
- "analyzed_fields": {
- "includes": [
- "AvgTicketPrice",
- "Cancelled",
- "DestWeather",
- "FlightDelayMin",
- "DistanceMiles"
- ]
- },
- "model_memory_limit": "30mb"
- }
- --------------------------------------------------
- // TEST[skip:TBD]
- NOTE: These custom feature processors are optional; automatic
- {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all
- categorical features.
- [[ml-put-dfanalytics-example-c]]
- === {classification-cap} example
- The following example creates the `loan_classification` {dfanalytics-job}, the
- analysis type is `classification`:
- [source,console]
- --------------------------------------------------
- PUT _ml/data_frame/analytics/loan_classification
- {
- "source" : {
- "index": "loan-applicants"
- },
- "dest" : {
- "index": "loan-applicants-classified"
- },
- "analysis" : {
- "classification": {
- "dependent_variable": "label",
- "training_percent": 75,
- "num_top_classes": 2
- }
- }
- }
- --------------------------------------------------
- // TEST[skip:TBD]
|