123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189 |
- --
- :api: put-data-frame-analytics
- :request: PutDataFrameAnalyticsRequest
- :response: PutDataFrameAnalyticsResponse
- --
- [role="xpack"]
- [id="{upid}-{api}"]
- === Put {dfanalytics-jobs} API
- beta::[]
- Creates a new {dfanalytics-job}.
- The API accepts a +{request}+ object as a request and returns a +{response}+.
- [id="{upid}-{api}-request"]
- ==== Put {dfanalytics-jobs} request
- A +{request}+ requires the following argument:
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-request]
- --------------------------------------------------
- <1> The configuration of the {dfanalytics-job} to create
- [id="{upid}-{api}-config"]
- ==== {dfanalytics-cap} configuration
- The `DataFrameAnalyticsConfig` object contains all the details about the {dfanalytics-job}
- configuration and contains the following arguments:
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-config]
- --------------------------------------------------
- <1> The {dfanalytics-job} ID
- <2> The source index and query from which to gather data
- <3> The destination index
- <4> The analysis to be performed
- <5> The fields to be included in / excluded from the analysis
- <6> The memory limit for the model created as part of the analysis process
- <7> Optionally, a human-readable description
- <8> The maximum number of threads to be used by the analysis. Defaults to 1.
- [id="{upid}-{api}-query-config"]
- ==== SourceConfig
- The index and the query from which to collect data.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-source-config]
- --------------------------------------------------
- <1> Constructing a new DataFrameAnalyticsSource
- <2> The source index
- <3> The query from which to gather the data. If query is not set, a `match_all` query is used by default.
- <4> Source filtering to select which fields will exist in the destination index.
- ===== QueryConfig
- The query with which to select data from the source.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-query-config]
- --------------------------------------------------
- ==== DestinationConfig
- The index to which data should be written by the {dfanalytics-job}.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-dest-config]
- --------------------------------------------------
- <1> Constructing a new DataFrameAnalyticsDest
- <2> The destination index
- ==== Analysis
- The analysis to be performed.
- Currently, the supported analyses include: +OutlierDetection+, +Classification+, +Regression+.
- ===== Outlier detection
- +OutlierDetection+ analysis can be created in one of two ways:
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-outlier-detection-default]
- --------------------------------------------------
- <1> Constructing a new OutlierDetection object with default strategy to determine outliers
- or
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-outlier-detection-customized]
- --------------------------------------------------
- <1> Constructing a new OutlierDetection object
- <2> The method used to perform the analysis
- <3> Number of neighbors taken into account during analysis
- <4> The min `outlier_score` required to compute feature influence
- <5> Whether to compute feature influence
- <6> The proportion of the data set that is assumed to be outlying prior to outlier detection
- <7> Whether to apply standardization to feature values
- ===== Classification
- +Classification+ analysis requires to set which is the +dependent_variable+ and
- has a number of other optional parameters:
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-classification]
- --------------------------------------------------
- <1> Constructing a new Classification builder object with the required dependent variable
- <2> The lambda regularization parameter. A non-negative double.
- <3> The gamma regularization parameter. A non-negative double.
- <4> The applied shrinkage. A double in [0.001, 1].
- <5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].
- <6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
- <7> If set, feature importance for the top most important features will be computed.
- <8> The name of the prediction field in the results object.
- <9> The percentage of training-eligible rows to be used in training. Defaults to 100%.
- <10> The seed to be used by the random generator that picks which rows are used in training.
- <11> The optimization objective to target when assigning class labels. Defaults to maximize_minimum_recall.
- <12> The number of top classes (or -1 which denotes all classes) to be reported in the results. Defaults to 2.
- <13> Custom feature processors that will create new features for analysis from the included document
- fields. Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all features.
- <14> The alpha regularization parameter. A non-negative double.
- <15> The growth rate of the shrinkage parameter. A double in [0.5, 2.0].
- <16> The soft tree depth limit. A non-negative double.
- <17> The soft tree depth tolerance. Controls how much the soft tree depth limit is respected. A double greater than or equal to 0.01.
- <18> The amount by which to downsample the data for stochastic gradient estimates. A double in (0, 1.0].
- <19> The maximum number of optimisation rounds we use for hyperparameter optimisation per parameter. An integer in [0, 20].
- <20> Whether to enable early stopping to finish training process if it is not finding better models.
- ===== Regression
- +Regression+ analysis requires to set which is the +dependent_variable+ and
- has a number of other optional parameters:
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-regression]
- --------------------------------------------------
- <1> Constructing a new Regression builder object with the required dependent variable
- <2> The lambda regularization parameter. A non-negative double.
- <3> The gamma regularization parameter. A non-negative double.
- <4> The applied shrinkage. A double in [0.001, 1].
- <5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].
- <6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
- <7> If set, feature importance for the top most important features will be computed.
- <8> The name of the prediction field in the results object.
- <9> The percentage of training-eligible rows to be used in training. Defaults to 100%.
- <10> The seed to be used by the random generator that picks which rows are used in training.
- <11> The loss function used for regression. Defaults to `mse`.
- <12> An optional parameter to the loss function.
- <13> Custom feature processors that will create new features for analysis from the included document
- fields. Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all features.
- <14> The alpha regularization parameter. A non-negative double.
- <15> The growth rate of the shrinkage parameter. A double in [0.5, 2.0].
- <16> The soft tree depth limit. A non-negative double.
- <17> The soft tree depth tolerance. Controls how much the soft tree depth limit is respected. A double greater than or equal to 0.01.
- <18> The amount by which to downsample the data for stochastic gradient estimates. A double in (0, 1.0].
- <19> The maximum number of optimisation rounds we use for hyperparameter optimisation per parameter. An integer in [0, 20].
- <20> Whether to enable early stopping to finish training process if it is not finding better models.
- ==== Analyzed fields
- FetchContext object containing fields to be included in / excluded from the analysis
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-analyzed-fields]
- --------------------------------------------------
- include::../execution.asciidoc[]
- [id="{upid}-{api}-response"]
- ==== Response
- The returned +{response}+ contains the newly created {dfanalytics-job}.
- ["source","java",subs="attributes,callouts,macros"]
- --------------------------------------------------
- include-tagged::{doc-tests-file}[{api}-response]
- --------------------------------------------------
|