| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189 | --:api: put-data-frame-analytics:request: PutDataFrameAnalyticsRequest:response: PutDataFrameAnalyticsResponse--[role="xpack"][id="{upid}-{api}"]=== Create {dfanalytics-jobs} APICreates a new {dfanalytics-job}.The API accepts a +{request}+ object as a request and returns a +{response}+.[id="{upid}-{api}-request"]==== RequestA +{request}+ requires the following argument:["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-request]--------------------------------------------------<1> The configuration of the {dfanalytics-job} to create[id="{upid}-{api}-config"]==== {dfanalytics-cap} configurationThe `DataFrameAnalyticsConfig` object contains all the details about the {dfanalytics-job}configuration and contains the following arguments:["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-config]--------------------------------------------------<1> The {dfanalytics-job} ID<2> The source index and query from which to gather data<3> The destination index<4> The analysis to be performed<5> The fields to be included in / excluded from the analysis<6> The memory limit for the model created as part of the analysis process<7> Optionally, a human-readable description<8> The maximum number of threads to be used by the analysis. Defaults to 1.[id="{upid}-{api}-query-config"]==== SourceConfigThe index and the query from which to collect data.["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-source-config]--------------------------------------------------<1> Constructing a new DataFrameAnalyticsSource<2> The source index<3> The query from which to gather the data. If query is not set, a `match_all` query is used by default.<4> Runtime mappings that will be added to the destination index mapping.<5> Source filtering to select which fields will exist in the destination index.===== QueryConfigThe query with which to select data from the source.["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-query-config]--------------------------------------------------==== DestinationConfigThe index to which data should be written by the {dfanalytics-job}.["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-dest-config]--------------------------------------------------<1> Constructing a new DataFrameAnalyticsDest<2> The destination index==== AnalysisThe analysis to be performed.Currently, the supported analyses include: +OutlierDetection+, +Classification+, +Regression+.===== Outlier detection+OutlierDetection+ analysis can be created in one of two ways:["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-outlier-detection-default]--------------------------------------------------<1> Constructing a new OutlierDetection object with default strategy to determine outliersor["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-outlier-detection-customized]--------------------------------------------------<1> Constructing a new OutlierDetection object<2> The method used to perform the analysis<3> Number of neighbors taken into account during analysis<4> The min `outlier_score` required to compute feature influence<5> Whether to compute feature influence<6> The proportion of the data set that is assumed to be outlying prior to outlier detection<7> Whether to apply standardization to feature values===== Classification+Classification+ analysis requires to set which is the +dependent_variable+ andhas a number of other optional parameters:["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-classification]--------------------------------------------------<1> Constructing a new Classification builder object with the required dependent variable<2> The lambda regularization parameter. A non-negative double.<3> The gamma regularization parameter. A non-negative double.<4> The applied shrinkage. A double in [0.001, 1].<5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].<6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].<7> If set, feature importance for the top most important features will be computed.<8> The name of the prediction field in the results object.<9> The percentage of training-eligible rows to be used in training. Defaults to 100%.<10> The seed to be used by the random generator that picks which rows are used in training.<11> The optimization objective to target when assigning class labels. Defaults to maximize_minimum_recall.<12> The number of top classes (or -1 which denotes all classes) to be reported in the results. Defaults to 2.<13> Custom feature processors that will create new features for analysis from the included document     fields. Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all features.<14> The alpha regularization parameter. A non-negative double.<15> The growth rate of the shrinkage parameter. A double in [0.5, 2.0].<16> The soft tree depth limit. A non-negative double.<17> The soft tree depth tolerance. Controls how much the soft tree depth limit is respected. A double greater than or equal to 0.01.<18> The amount by which to downsample the data for stochastic gradient estimates. A double in (0, 1.0].<19> The maximum number of optimisation rounds we use for hyperparameter optimisation per parameter. An integer in [0, 20].<20> Whether to enable early stopping to finish training process if it is not finding better models.===== Regression+Regression+ analysis requires to set which is the +dependent_variable+ andhas a number of other optional parameters:["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-regression]--------------------------------------------------<1> Constructing a new Regression builder object with the required dependent variable<2> The lambda regularization parameter. A non-negative double.<3> The gamma regularization parameter. A non-negative double.<4> The applied shrinkage. A double in [0.001, 1].<5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].<6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].<7> If set, feature importance for the top most important features will be computed.<8> The name of the prediction field in the results object.<9> The percentage of training-eligible rows to be used in training. Defaults to 100%.<10> The seed to be used by the random generator that picks which rows are used in training.<11> The loss function used for regression. Defaults to `mse`.<12> An optional parameter to the loss function.<13> Custom feature processors that will create new features for analysis from the included documentfields. Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all features.<14> The alpha regularization parameter. A non-negative double.<15> The growth rate of the shrinkage parameter. A double in [0.5, 2.0].<16> The soft tree depth limit. A non-negative double.<17> The soft tree depth tolerance. Controls how much the soft tree depth limit is respected. A double greater than or equal to 0.01.<18> The amount by which to downsample the data for stochastic gradient estimates. A double in (0, 1.0].<19> The maximum number of optimisation rounds we use for hyperparameter optimisation per parameter. An integer in [0, 20].<20> Whether to enable early stopping to finish training process if it is not finding better models.==== Analyzed fieldsFetchContext object containing fields to be included in / excluded from the analysis["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-analyzed-fields]--------------------------------------------------include::../execution.asciidoc[][id="{upid}-{api}-response"]==== ResponseThe returned +{response}+ contains the newly created {dfanalytics-job}.["source","java",subs="attributes,callouts,macros"]--------------------------------------------------include-tagged::{doc-tests-file}[{api}-response]--------------------------------------------------
 |