Browse Source

[DOCS] Moves data frame analytics job resource definitions into APIs (#50021)

István Zoltán Szabó 5 years ago
parent
commit
3857e3d94f

+ 216 - 0
docs/reference/ml/df-analytics/apis/analysisobjects.asciidoc

@@ -0,0 +1,216 @@
+[role="xpack"]
+[testenv="platinum"]
+[[ml-dfa-analysis-objects]]
+=== Analysis configuration objects
+
+{dfanalytics-cap} resources contain `analysis` objects. For example, when you
+create a {dfanalytics-job}, you must define the type of analysis it performs. 
+This page lists all the available parameters that you can use in the `analysis` 
+object grouped by {dfanalytics} types.
+
+
+[discrete]
+[[oldetection-resources]]
+==== {oldetection-cap} configuration objects
+
+An `outlier_detection` configuration object has the following properties:
+
+`compute_feature_influence`::
+(Optional, boolean) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
+  
+`feature_influence_threshold`:: 
+(Optional, double) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
+
+`method`::
+(Optional, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=method]
+  
+`n_neighbors`::
+(Optional, integer)
+include::{docdir}/ml/ml-shared.asciidoc[tag=n-neighbors]
+  
+`outlier_fraction`::
+(Optional, double) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
+  
+`standardization_enabled`::
+(Optional, boolean) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
+
+
+[discrete]
+[[reganalysis-resources]]
+==== {regression-cap} configuration objects
+
+[source,console]
+--------------------------------------------------
+PUT _ml/data_frame/analytics/house_price_regression_analysis
+{
+  "source": {
+    "index": "houses_sold_last_10_yrs" <1>
+  },
+  "dest": {
+    "index": "house_price_predictions" <2>
+  },
+  "analysis": 
+    {
+      "regression": { <3>
+        "dependent_variable": "price" <4>
+      }
+    }
+}
+--------------------------------------------------
+// TEST[skip:TBD]
+
+<1> Training data is taken from source index `houses_sold_last_10_yrs`.
+<2> Analysis results will be output to destination index 
+`house_price_predictions`.
+<3> The regression analysis configuration object.
+<4> Regression analysis will use field `price` to train on. As no other 
+parameters have been specified it will train on 100% of eligible data, store its 
+prediction in destination index field `price_prediction` and use in-built 
+hyperparameter optimization to give minimum validation errors.
+
+
+[float]
+[[regression-resources-standard]]
+===== Standard parameters
+
+`dependent_variable`::
+(Required, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
++
+--
+The data type of the field must be numeric.
+--
+
+`prediction_field_name`::
+(Optional, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
+
+`training_percent`::
+(Optional, integer)
+include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
+
+`randomize_seed`::
+(Optional, long)
+include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
+
+
+[float]
+[[regression-resources-advanced]]
+===== Advanced parameters
+
+Advanced parameters are for fine-tuning {reganalysis}. They are set 
+automatically by <<ml-hyperparam-optimization,hyperparameter optimization>> 
+to give minimum validation error. It is highly recommended to use the default 
+values unless you fully understand the function of these parameters. If these 
+parameters are not supplied, their values are automatically tuned to give 
+minimum validation error.
+
+`eta`::
+(Optional, double) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
+
+`feature_bag_fraction`::
+(Optional, double) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
+
+`maximum_number_trees`::
+(Optional, integer) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
+
+`gamma`::
+(Optional, double) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
+
+`lambda`::
+(Optional, double) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
+
+
+[discrete]
+[[classanalysis-resources]]
+==== {classification-cap} configuration objects 
+ 
+ 
+[float]
+[[classification-resources-standard]]
+===== Standard parameters
+ 
+`dependent_variable`::
+(Required, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
++
+--
+The data type of the field must be numeric or boolean.
+--
+  
+`num_top_classes`::
+(Optional, integer)
+include::{docdir}/ml/ml-shared.asciidoc[tag=num-top-classes]
+ 
+`prediction_field_name`::
+(Optional, string) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
+
+`training_percent`::
+(Optional, integer) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
+
+`randomize_seed`::
+(Optional, long)
+include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
+
+
+[float]
+[[classification-resources-advanced]]
+===== Advanced parameters
+
+Advanced parameters are for fine-tuning {classanalysis}. They are set 
+automatically by <<ml-hyperparam-optimization,hyperparameter optimization>> 
+to give minimum validation error. It is highly recommended to use the default 
+values unless you fully understand the function of these parameters. If these 
+parameters are not supplied, their values are automatically tuned to give 
+minimum validation error.
+
+`eta`::
+(Optional, double) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
+
+`feature_bag_fraction`::
+(Optional, double) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
+
+`maximum_number_trees`::
+(Optional, integer) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
+
+`gamma`::
+(Optional, double) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
+
+`lambda`::
+(Optional, double) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
+
+[discrete]
+[[ml-hyperparam-optimization]]
+==== Hyperparameter optimization
+
+If you don't supply {regression} or {classification} parameters, hyperparameter 
+optimization will be performed by default to set a value for the undefined 
+parameters. The starting point is calculated for data dependent parameters by 
+examining the loss on the training data. Subject to the size constraint, this 
+operation provides an upper bound on the improvement in validation loss.
+
+A fixed number of rounds is used for optimization which depends on the number of 
+parameters being optimized. The optimization starts with random search, then 
+Bayesian optimization is performed that is targeting maximum expected 
+improvement. If you override any parameters, then the optimization will 
+calculate the value of the remaining parameters accordingly and use the value 
+you provided for the overridden parameter. The number of rounds are reduced 
+respectively. The validation error is estimated in each round by using 4-fold 
+cross validation.

+ 4 - 0
docs/reference/ml/df-analytics/apis/delete-dfanalytics.asciidoc

@@ -11,17 +11,20 @@ Deletes an existing {dfanalytics-job}.
 
 experimental[]
 
+
 [[ml-delete-dfanalytics-request]]
 ==== {api-request-title}
 
 `DELETE _ml/data_frame/analytics/<data_frame_analytics_id>`
 
+
 [[ml-delete-dfanalytics-prereq]]
 ==== {api-prereq-title}
 
 * You must have `machine_learning_admin` built-in role to use this API. For more 
 information, see <<security-privileges>> and <<built-in-roles>>.
 
+
 [[ml-delete-dfanalytics-path-params]]
 ==== {api-path-parms-title}
 
@@ -29,6 +32,7 @@ information, see <<security-privileges>> and <<built-in-roles>>.
 (Required, string)
 include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics]
 
+
 [[ml-delete-dfanalytics-example]]
 ==== {api-examples-title}
 

+ 0 - 298
docs/reference/ml/df-analytics/apis/dfanalyticsresources.asciidoc

@@ -1,298 +0,0 @@
-[role="xpack"]
-[testenv="platinum"]
-[[ml-dfanalytics-resources]]
-=== {dfanalytics-cap} job resources
-
-{dfanalytics-cap} resources relate to APIs such as <<put-dfanalytics>> and
-<<get-dfanalytics>>.	
-
-[discrete]	
-[[ml-dfanalytics-properties]]	
-==== {api-definitions-title}
-
-`analysis`::
-  (object) The type of analysis that is performed on the `source`. For example: 
-  `outlier_detection` or `regression`. For more information, see 
-  <<dfanalytics-types>>.
-  
-`analyzed_fields`::
-  (Optional, object) Specify `includes` and/or `excludes` patterns to select
-  which fields will be included in the analysis. If `analyzed_fields` is not set,
-  only the relevant fields will be included. For example, all the numeric fields
-  for {oldetection}. For the supported field types, see <<ml-put-dfanalytics-supported-fields>>.
-  Also see the <<explain-dfanalytics>> which helps understand field selection.
-    
-  `includes`:::
-    (Optional, array) An array of strings that defines the fields that will be included in 
-    the analysis.
-      
-  `excludes`:::
-    (Optional, array) An array of strings that defines the fields that will be excluded 
-    from the analysis.
-  
-
-[source,console]
---------------------------------------------------
-PUT _ml/data_frame/analytics/loganalytics
-{
-  "source": {
-    "index": "logdata"
-  },
-  "dest": {
-    "index": "logdata_out"
-  },
-  "analysis": {
-    "outlier_detection": {
-    }
-  },
-  "analyzed_fields": {
-        "includes": [ "request.bytes", "response.counts.error" ],
-        "excludes": [ "source.geo" ]
-  }
-}
---------------------------------------------------
-// TEST[setup:setup_logdata]
-
-`description`::
-  (Optional, string) A description of the job.
-
-`dest`::
-  (object) The destination configuration of the analysis.
-  
-  `index`:::
-    (Required, string) Defines the _destination index_ to store the results of 
-    the {dfanalytics-job}.
-  
-  `results_field`:::
-    (Optional, string) Defines the name of the field in which to store the 
-    results of the analysis. Default to `ml`.
-
-`id`::
-  (string) The unique identifier for the {dfanalytics-job}. This identifier can 
-  contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and 
-  underscores. It must start and end with alphanumeric characters. This property 
-  is informational; you cannot change the identifier for existing jobs.
-  
-`model_memory_limit`::
-  (string) The approximate maximum amount of memory resources that are 
-  permitted for analytical processing. The default value for {dfanalytics-jobs} 
-  is `1gb`. If your `elasticsearch.yml` file contains an 
-  `xpack.ml.max_model_memory_limit` setting, an error occurs when you try to 
-  create {dfanalytics-jobs} that have `model_memory_limit` values greater than 
-  that setting. For more information, see <<ml-settings>>.
-
-`source`::
-  (object) The configuration of how to source the analysis data. It requires an `index`.
-  Optionally, `query` and `_source` may be specified.
-  
-  `index`:::
-    (Required, string or array) Index or indices on which to perform the 
-    analysis. It can be a single index or index pattern as well as an array of 
-    indices or patterns.
-    
-  `query`:::
-    (Optional, object) The {es} query domain-specific language 
-    (<<query-dsl,DSL>>). This value corresponds to the query object in an {es} 
-    search POST body. All the options that are supported by {es} can be used, 
-    as this object is passed verbatim to {es}. By default, this property has 
-    the following value: `{"match_all": {}}`.
-
-  `_source`:::
-    (Optional, object) Specify `includes` and/or `excludes` patterns to select
-    which fields will be present in the destination. Fields that are excluded
-    cannot be included in the analysis.
-        
-      `includes`::::
-        (array) An array of strings that defines the fields that will be included in 
-        the destination.
-          
-      `excludes`::::
-        (array) An array of strings that defines the fields that will be excluded 
-        from the destination.
-
-[[dfanalytics-types]]
-==== Analysis objects
-
-{dfanalytics-cap} resources contain `analysis` objects. For example, when you
-create a {dfanalytics-job}, you must define the type of analysis it performs.
-
-[discrete]
-[[oldetection-resources]]
-==== {oldetection-cap} configuration objects 
-
-An `outlier_detection` configuration object has the following properties:
-
-`compute_feature_influence`::
-  (boolean) If `true`, the feature influence calculation is enabled. Defaults to 
-  `true`.
-  
-`feature_influence_threshold`:: 
-  (double) The minimum {olscore} that a document needs to have in order to 
-  calculate its {fiscore}. Value range: 0-1 (`0.1` by default).
-
-`method`::
-  (string) Sets the method that {oldetection} uses. If the method is not set 
-  {oldetection} uses an ensemble of different methods and normalises and 
-  combines their individual {olscores} to obtain the overall {olscore}. We 
-  recommend to use the ensemble method. Available methods are `lof`, `ldof`, 
-  `distance_kth_nn`, `distance_knn`.
-  
-  `n_neighbors`::
-  (integer) Defines the value for how many nearest neighbors each method of 
-  {oldetection} will use to calculate its {olscore}. When the value is not set, 
-  different values will be used for different ensemble members. This helps 
-  improve diversity in the ensemble. Therefore, only override this if you are 
-  confident that the value you choose is appropriate for the data set.
-  
-`outlier_fraction`::
-  (double) Sets the proportion of the data set that is assumed to be outlying prior to 
-  {oldetection}. For example, 0.05 means it is assumed that 5% of values are real outliers 
-  and 95% are inliers.
-  
-`standardization_enabled`::
-  (boolean) If `true`, then the following operation is performed on the columns 
-  before computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to 
-  `true`. For more information, see 
-  https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].
-
-
-[discrete]
-[[regression-resources]]
-==== {regression-cap} configuration objects
-
-[source,console]
---------------------------------------------------
-PUT _ml/data_frame/analytics/house_price_regression_analysis
-{
-  "source": {
-    "index": "houses_sold_last_10_yrs" <1>
-  },
-  "dest": {
-    "index": "house_price_predictions" <2>
-  },
-  "analysis": 
-    {
-      "regression": { <3>
-        "dependent_variable": "price" <4>
-      }
-    }
-}
---------------------------------------------------
-// TEST[skip:TBD]
-
-<1> Training data is taken from source index `houses_sold_last_10_yrs`.
-<2> Analysis results will be output to destination index 
-`house_price_predictions`.
-<3> The regression analysis configuration object.
-<4> Regression analysis will use field `price` to train on. As no other 
-parameters have been specified it will train on 100% of eligible data, store its 
-prediction in destination index field `price_prediction` and use in-built 
-hyperparameter optimization to give minimum validation errors.
-
-
-[float]
-[[regression-resources-standard]]
-===== Standard parameters
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=dependent_variable]
-+
---
-The data type of the field must be numeric.
---
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=prediction_field_name]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=training_percent]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=randomize_seed]
-
-
-[float]
-[[regression-resources-advanced]]
-===== Advanced parameters
-
-Advanced parameters are for fine-tuning {reganalysis}. They are set 
-automatically by <<ml-hyperparameter-optimization,hyperparameter optimization>> 
-to give minimum validation error. It is highly recommended to use the default 
-values unless you fully understand the function of these parameters. If these 
-parameters are not supplied, their values are automatically tuned to give 
-minimum validation error.
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=feature_bag_fraction]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=maximum_number_trees]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
-
-
-[discrete]
-[[classification-resources]]
-==== {classification-cap} configuration objects 
- 
- 
-[float]
-[[classification-resources-standard]]
-===== Standard parameters
- 
-include::{docdir}/ml/ml-shared.asciidoc[tag=dependent_variable]
-+
---
-The data type of the field must be numeric or boolean.
---
-  
-`num_top_classes`::
-  (Optional, integer) Defines the number of categories for which the predicted 
-  probabilities are reported. It must be non-negative. If it is greater than the 
-  total number of categories (in the {version} version of the {stack}, it's two) 
-  to predict then we will report all category probabilities. Defaults to 2.
- 
-include::{docdir}/ml/ml-shared.asciidoc[tag=prediction_field_name]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=training_percent]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=randomize_seed]
-
-
-[float]
-[[classification-resources-advanced]]
-===== Advanced parameters
-
-Advanced parameters are for fine-tuning {classanalysis}. They are set 
-automatically by <<ml-hyperparameter-optimization,hyperparameter optimization>> 
-to give minimum validation error. It is highly recommended to use the default 
-values unless you fully understand the function of these parameters. If these 
-parameters are not supplied, their values are automatically tuned to give 
-minimum validation error.
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=feature_bag_fraction]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=maximum_number_trees]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
-
-include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
-
-
-[[ml-hyperparameter-optimization]]
-===== Hyperparameter optimization
-
-If you don't supply {regression} or {classification} parameters, hyperparameter 
-optimization will be performed by default to set a value for the undefined 
-parameters. The starting point is calculated for data dependent parameters by 
-examining the loss on the training data. Subject to the size constraint, this 
-operation provides an upper bound on the improvement in validation loss.
-
-A fixed number of rounds is used for optimization which depends on the number of 
-parameters being optimized. The optimization starts with random search, then 
-Bayesian optimization is performed that is targeting maximum expected 
-improvement. If you override any parameters, then the optimization will 
-calculate the value of the remaining parameters accordingly and use the value 
-you provided for the overridden parameter. The number of rounds are reduced 
-respectively. The validation error is estimated in each round by using 4-fold 
-cross validation.

+ 100 - 11
docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

@@ -12,6 +12,7 @@ Evaluates the {dfanalytics} for an annotated index.
 
 experimental[]
 
+
 [[ml-evaluate-dfanalytics-request]]
 ==== {api-request-title}
 
@@ -37,26 +38,113 @@ result field to be present.
 [[ml-evaluate-dfanalytics-request-body]]
 ==== {api-request-body-title}
 
-`index`::
-  (Required, object) Defines the `index` in which the evaluation will be
-  performed.
-
-`query`::
-  (Optional, object) A query clause that retrieves a subset of data from the 
-  source index. See <<query-dsl>>.
-
 `evaluation`::
-  (Required, object) Defines the type of evaluation you want to perform. See 
-  <<ml-evaluate-dfanalytics-resources>>.
+(Required, object) Defines the type of evaluation you want to perform. The 
+value of this object can be different depending on the type of evaluation you 
+want to perform. See <<ml-evaluate-dfanalytics-resources>>.
 +
 --
 Available evaluation types:
-
 * `binary_soft_classification`
 * `regression`
 * `classification`
 --
 
+`index`::
+(Required, object) Defines the `index` in which the evaluation will be
+performed.
+
+`query`::
+(Optional, object) A query clause that retrieves a subset of data from the 
+source index. See <<query-dsl>>.
+
+[[ml-evaluate-dfanalytics-resources]]
+==== {dfanalytics-cap} evaluation resources
+
+[[binary-sc-resources]]
+===== Binary soft classification configuration objects
+
+Binary soft classification evaluates the results of an analysis which outputs 
+the probability that each document belongs to a certain class. For example, in 
+the context of {oldetection}, the analysis outputs the probability whether each 
+document is an outlier.
+
+`actual_field`::
+  (Required, string) The field of the `index` which contains the `ground truth`. 
+  The data type of this field can be boolean or integer. If the data type is 
+  integer, the value has to be either `0` (false) or `1` (true).
+
+`predicted_probability_field`::
+  (Required, string) The field of the `index` that defines the probability of 
+  whether the item belongs to the class in question or not. It's the field that 
+  contains the results of the analysis.
+
+`metrics`::
+  (Optional, object) Specifies the metrics that are used for the evaluation. 
+  Available metrics:
+  
+  `auc_roc`::
+    (Optional, object) The AUC ROC (area under the curve of the receiver 
+    operating characteristic) score and optionally the curve. Default value is 
+    {"includes_curve": false}.
+    
+  `precision`::
+    (Optional, object) Set the different thresholds of the {olscore} at where 
+    the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
+  
+  `recall`::
+    (Optional, object) Set the different thresholds of the {olscore} at where 
+    the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
+  
+  `confusion_matrix`::
+    (Optional, object) Set the different thresholds of the {olscore} at where 
+    the metrics (`tp` - true positive, `fp` - false positive, `tn` - true 
+    negative, `fn` - false negative) are calculated. Default value is 
+    {"at": [0.25, 0.50, 0.75]}.
+
+    
+[[regression-evaluation-resources]]
+===== {regression-cap} evaluation objects
+
+{regression-cap} evaluation evaluates the results of a {regression} analysis 
+which outputs a prediction of values.
+
+`actual_field`::
+  (Required, string) The field of the `index` which contains the `ground truth`. 
+  The data type of this field must be numerical.
+  
+`predicted_field`::
+  (Required, string) The field in the `index` that contains the predicted value, 
+  in other words the results of the {regression} analysis.
+  
+`metrics`::
+  (Required, object) Specifies the metrics that are used for the evaluation. 
+  Available metrics are `r_squared` and `mean_squared_error`.
+  
+  
+[[classification-evaluation-resources]]
+==== {classification-cap} evaluation objects
+
+{classification-cap} evaluation evaluates the results of a {classanalysis} which 
+outputs a prediction that identifies to which of the classes each document 
+belongs.
+
+`actual_field`::
+  (Required, string) The field of the `index` which contains the ground truth. 
+  The data type of this field must be keyword.
+  
+`metrics`::
+  (Required, object) Specifies the metrics that are used for the evaluation. 
+  Available metric is `multiclass_confusion_matrix`.
+  
+`predicted_field`::
+  (Required, string) The field in the `index` that contains the predicted value, 
+  in other words the results of the {classanalysis}. The data type of this field 
+  is string. You need to add `.keyword` to the predicted field name (the name 
+  you put in the {classanalysis} object as `prediction_field_name` or the 
+  default value of the same field if you didn't specified explicitly). For 
+  example, `predicted_field` : `ml.animal_class_prediction.keyword`.
+
 
 ////
 [[ml-evaluate-dfanalytics-results]]
@@ -75,6 +163,7 @@ Available evaluation types:
 `recall`::: TBD
 ////
 
+
 [[ml-evaluate-dfanalytics-example]]
 ==== {api-examples-title}
 

+ 0 - 128
docs/reference/ml/df-analytics/apis/evaluateresources.asciidoc

@@ -1,128 +0,0 @@
-[role="xpack"]
-[testenv="platinum"]
-[[ml-evaluate-dfanalytics-resources]]
-=== {dfanalytics-cap} evaluation resources
-
-Evaluation configuration objects relate to the <<evaluate-dfanalytics>>.
-
-[discrete]
-[[ml-evaluate-dfanalytics-properties]]
-==== {api-definitions-title}
-
-`evaluation`::
-  (object) Defines the type of evaluation you want to perform. The value of this 
-  object can be different depending on the type of evaluation you want to 
-  perform.
-+
---
-Available evaluation types:
-* `binary_soft_classification`
-* `regression`
-* `classification`
---
-  
-`query`::
-  (object) A query clause that retrieves a subset of data from the source index. 
-  See <<query-dsl>>. The evaluation only applies to those documents of the index 
-  that match the query.
-
-
-[[binary-sc-resources]]
-==== Binary soft classification configuration objects
-
-Binary soft classification evaluates the results of an analysis which outputs 
-the probability that each document belongs to a certain class. For 
-example, in the context of outlier detection, the analysis outputs the 
-probability whether each document is an outlier.
-
-[discrete]
-[[binary-sc-resources-properties]]
-===== {api-definitions-title}
-
-`actual_field`::
-  (string) The field of the `index` which contains the `ground truth`. 
-  The data type of this field can be boolean or integer. If the data type is 
-  integer, the value has to be either `0` (false) or `1` (true).
-
-`predicted_probability_field`::
-  (string) The field of the `index` that defines the probability of 
-  whether the item belongs to the class in question or not. It's the field that 
-  contains the results of the analysis.
-
-`metrics`::
-  (object) Specifies the metrics that are used for the evaluation. 
-  Available metrics:
-  
-  `auc_roc`::
-    (object) The AUC ROC (area under the curve of the receiver operating 
-    characteristic) score and optionally the curve.
-    Default value is {"includes_curve": false}.
-    
-  `precision`::
-    (object) Set the different thresholds of the {olscore} at where the metric 
-    is calculated.
-    Default value is {"at": [0.25, 0.50, 0.75]}.
-  
-  `recall`::
-    (object) Set the different thresholds of the {olscore} at where the metric 
-    is calculated.
-    Default value is {"at": [0.25, 0.50, 0.75]}.
-  
-  `confusion_matrix`::
-    (object) Set the different thresholds of the {olscore} at where the metrics 
-    (`tp` - true positive, `fp` - false positive, `tn` - true negative, `fn` - 
-    false negative) are calculated.
-    Default value is {"at": [0.25, 0.50, 0.75]}.
-
-    
-[[regression-evaluation-resources]]
-==== {regression-cap} evaluation objects
-
-{regression-cap} evaluation evaluates the results of a {regression} analysis 
-which outputs a prediction of values.
-
-
-[discrete]
-[[regression-evaluation-resources-properties]]
-===== {api-definitions-title}
-
-`actual_field`::
-  (string) The field of the `index` which contains the `ground truth`. The data 
-  type of this field must be numerical.
-  
-`predicted_field`::
-  (string) The field in the `index` that contains the predicted value, 
-  in other words the results of the {regression} analysis.
-  
-`metrics`::
-  (object) Specifies the metrics that are used for the evaluation. Available 
-  metrics are `r_squared` and `mean_squared_error`.
-  
-  
-[[classification-evaluation-resources]]
-==== {classification-cap} evaluation objects
-
-{classification-cap} evaluation evaluates the results of a {classanalysis} which 
-outputs a prediction that identifies to which of the classes each document 
-belongs.
-
-
-[discrete]
-[[classification-evaluation-resources-properties]]
-===== {api-definitions-title}
-
-`actual_field`::
-  (string) The field of the `index` which contains the ground truth. The data 
-  type of this field must be keyword.
-  
-`metrics`::
-  (object) Specifies the metrics that are used for the evaluation. Available 
-  metric is `multiclass_confusion_matrix`.
-  
-`predicted_field`::
-  (string) The field in the `index` that contains the predicted value, in other 
-  words the results of the {classanalysis}. The data type of this field is 
-  string. You need to add `.keyword` to the predicted field name (the name you 
-  put in the {classanalysis} object as `prediction_field_name` or the default 
-  value of the same field if you didn't specified explicitly). For example, 
-  `predicted_field` : `ml.animal_class_prediction.keyword`.

+ 19 - 35
docs/reference/ml/df-analytics/apis/explain-dfanalytics.asciidoc

@@ -12,6 +12,7 @@ Explains a {dataframe-analytics-config}.
 
 experimental[]
 
+
 [[ml-explain-dfanalytics-request]]
 ==== {api-request-title}
 
@@ -23,22 +24,28 @@ experimental[]
 
 `POST _ml/data_frame/analytics/<data_frame_analytics_id>/_explain`
 
+
 [[ml-explain-dfanalytics-prereq]]
 ==== {api-prereq-title}
 
 * You must have `monitor_ml` privilege to use this API. For more
 information, see <<security-privileges>> and <<built-in-roles>>.
 
+
 [[ml-explain-dfanalytics-desc]]
 ==== {api-description-title}
 
-This API provides explanations for a {dataframe-analytics-config} that either exists already or one that has not been created yet.
+This API provides explanations for a {dataframe-analytics-config} that either 
+exists already or one that has not been created yet.
 The following explanations are provided:
 
-* which fields are included or not in the analysis and why
-* how much memory is estimated to be required. The estimate can be used when deciding the appropriate value for `model_memory_limit` setting later on.
+* which fields are included or not in the analysis and why,
+* how much memory is estimated to be required. The estimate can be used when 
+  deciding the appropriate value for `model_memory_limit` setting later on,
+
 about either an existing {dfanalytics-job} or one that has not been created yet.
 
+
 [[ml-explain-dfanalytics-path-params]]
 ==== {api-path-parms-title}
 
@@ -46,13 +53,14 @@ about either an existing {dfanalytics-job} or one that has not been created yet.
 (Optional, string)
 include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics]
 
+
 [[ml-explain-dfanalytics-request-body]]
 ==== {api-request-body-title}
 
 `data_frame_analytics_config`::
-  (Optional, object) Intended configuration of {dfanalytics-job}. For more information, see
-  <<ml-dfanalytics-resources>>.
-  Note that `id` and `dest` don't need to be provided in the context of this API.
+  (Optional, object) Intended configuration of {dfanalytics-job}. Note that `id` 
+  and `dest` don't need to be provided in the context of this API.
+
 
 [[ml-explain-dfanalytics-results]]
 ==== {api-response-body-title}
@@ -60,38 +68,13 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics]
 The API returns a response that contains the following:
 
 `field_selection`::
-  (array) An array of objects that explain selection for each field, sorted by the field names.
-   Each object in the array has the following properties:
-
-  `name`:::
-    (string) The field name.
-
-  `mapping_types`:::
-    (string) The mapping types of the field.
-
-  `is_included`:::
-    (boolean) Whether the field is selected to be included in the analysis.
-
-  `is_required`:::
-    (boolean) Whether the field is required.
-
-  `feature_type`:::
-    (string) The feature type of this field for the analysis. May be `categorical` or `numerical`.
-
-  `reason`:::
-    (string) The reason a field is not selected to be included in the analysis.
+(array)
+include::{docdir}/ml/ml-shared.asciidoc[tag=field-selection]
 
 `memory_estimation`::
-  (object) An object containing the memory estimates. The object has the following properties:
-
-  `expected_memory_without_disk`:::
-    (string) Estimated memory usage under the assumption that the whole {dfanalytics} should happen in memory
-    (i.e. without overflowing to disk).
+(object) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=memory-estimation]
 
-  `expected_memory_with_disk`:::
-    (string) Estimated memory usage under the assumption that overflowing to disk is allowed during {dfanalytics}.
-    `expected_memory_with_disk` is usually smaller than `expected_memory_without_disk` as using disk allows to
-    limit the main memory needed to perform {dfanalytics}.
 
 [[ml-explain-dfanalytics-example]]
 ==== {api-examples-title}
@@ -114,6 +97,7 @@ POST _ml/data_frame/analytics/_explain
 --------------------------------------------------
 // TEST[skip:TBD]
 
+
 The API returns the following results:
 
 [source,console-result]

+ 6 - 23
docs/reference/ml/df-analytics/apis/get-dfanalytics-stats.asciidoc

@@ -48,12 +48,12 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-default]
 include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-match]
 
 `from`::
-  (Optional, integer) Skips the specified number of {dfanalytics-jobs}. The
-  default value is `0`.
+(Optional, integer) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=from]
 
 `size`::
-  (Optional, integer) Specifies the maximum number of {dfanalytics-jobs} to
-  obtain. The default value is `100`.
+(Optional, integer) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=size]
 
 
 [[ml-get-dfanalytics-stats-response-body]]
@@ -62,25 +62,8 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-match]
 The API returns the following information:
 
 `data_frame_analytics`::
-  (array) An array of statistics objects for {dfanalytics-jobs}, which are
-  sorted by the `id` value in ascending order.
-  
-  `id`::
-    (string) The unique identifier of the {dfanalytics-job}.
-    
-  `state`::
-    (string) Current state of the {dfanalytics-job}.
-    
-  `progress`::
-    (array) The progress report of the {dfanalytics-job} by phase.
-    
-    `phase`::
-      (string) Defines the phase of the {dfanalytics-job}. Possible phases: 
-      `reindexing`, `loading_data`, `analyzing`, and `writing_results`.
-      
-    `progress_percent`::
-      (integer) The progress that the {dfanalytics-job} has made expressed in 
-      percentage.
+(array) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=data-frame-analytics-stats]
 
 
 [[ml-get-dfanalytics-stats-response-codes]]

+ 27 - 26
docs/reference/ml/df-analytics/apis/get-dfanalytics.asciidoc

@@ -11,6 +11,7 @@ Retrieves configuration information for {dfanalytics-jobs}.
 
 experimental[]
 
+
 [[ml-get-dfanalytics-request]]
 ==== {api-request-title}
 
@@ -22,11 +23,13 @@ experimental[]
 
 `GET _ml/data_frame/analytics/_all`
 
+
 [[ml-get-dfanalytics-prereq]]
 ==== {api-prereq-title}
 
-* You must have `monitor_ml` privilege to use this API. For more 
-information, see <<security-privileges>> and <<built-in-roles>>.
+* You must have `monitor_ml` privilege to use this API. For more information, 
+see <<security-privileges>> and <<built-in-roles>>.
+
 
 [[ml-get-dfanalytics-desc]]
 ==== {api-description-title}
@@ -34,47 +37,44 @@ information, see <<security-privileges>> and <<built-in-roles>>.
 You can get information for multiple {dfanalytics-jobs} in a single API request 
 by using a comma-separated list of {dfanalytics-jobs} or a wildcard expression.
 
+
 [[ml-get-dfanalytics-path-params]]
 ==== {api-path-parms-title}
 
 `<data_frame_analytics_id>`::
-  (Optional, string) Identifier for the {dfanalytics-job}. If you do not specify
-  one of these options, the API returns information for the first hundred
-  {dfanalytics-jobs}. You can get information for all {dfanalytics-jobs} by 
-  using _all, by specifying `*` as the `<data_frame_analytics_id>`, or by 
-  omitting the `<data_frame_analytics_id>`.
+(Optional, string) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-default]
++
+--
+You can get information for all {dfanalytics-jobs} by using _all, by specifying 
+`*` as the `<data_frame_analytics_id>`, or by omitting the 
+`<data_frame_analytics_id>`.
+--
+
 
 [[ml-get-dfanalytics-query-params]]
 ==== {api-query-parms-title}
 
 `allow_no_match`::
-  (Optional, boolean) Specifies what to do when the request:
-+
---
-* Contains wildcard expressions and there are no {dfanalytics-jobs} that match.
-* Contains the `_all` string or no identifiers and there are no matches.
-* Contains wildcard expressions and there are only partial matches. 
-
-The default value is `true`, which returns an empty `data_frame_analytics` array 
-when there are no matches and the subset of results when there are partial 
-matches. If this parameter is `false`, the request returns a `404` status code 
-when there are no matches or only partial matches.
---
+(Optional, boolean) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-match]
 
 `from`::
-  (Optional, integer) Skips the specified number of {dfanalytics-jobs}. The
-  default value is `0`.
+(Optional, integer) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=from]
 
 `size`::
-  (Optional, integer) Specifies the maximum number of {dfanalytics-jobs} to 
-  obtain. The default value is `100`.
-  
+(Optional, integer) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=size]
+
+
 [[ml-get-dfanalytics-results]]
 ==== {api-response-body-title}
 
 `data_frame_analytics`::
-  (array) An array of {dfanalytics-job} resources. For more information, see
-  <<ml-dfanalytics-resources>>.
+(array) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=data-frame-analytics]
+
 
 [[ml-get-dfanalytics-response-codes]]
 ==== {api-response-codes-title}
@@ -83,6 +83,7 @@ when there are no matches or only partial matches.
   If `allow_no_match` is `false`, this code indicates that there are no
   resources that match the request or only partial matches for the request.
 
+
 [[ml-get-dfanalytics-example]]
 ==== {api-examples-title}
 

+ 2 - 0
docs/reference/ml/df-analytics/apis/index.asciidoc

@@ -14,6 +14,8 @@ You can use the following APIs to perform {ml} {dfanalytics} activities.
 * <<evaluate-dfanalytics,Evaluate {dfanalytics}>>
 * <<explain-dfanalytics,Explain {dfanalytics}>>
 
+For the `analysis` object resources, check <<ml-dfa-analysis-objects>>.
+
 See also <<ml-apis>>.
 
 //CREATE

+ 41 - 76
docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

@@ -93,91 +93,55 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
 ==== {api-request-body-title}
 
 `analysis`::
-  (Required, object) Defines the type of {dfanalytics} you want to perform on 
-  your source index. For example: `outlier_detection`. See 
-  <<dfanalytics-types>>.
+(Required, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=analysis]
   
 `analyzed_fields`::
-  (Optional, object) Specify `includes` and/or `excludes` patterns to select
-  which fields will be included in the analysis. If `analyzed_fields` is not 
-  set, only the relevant fields will be included. For example, all the numeric 
-  fields for {oldetection}. For the supported field types, see 
-  <<ml-put-dfanalytics-supported-fields>>. Also see the <<explain-dfanalytics>> 
-  which helps understand field selection.
-
-  `includes`:::
-    (Optional, array) An array of strings that defines the fields that will be 
-    included in the analysis.
-    
-  `excludes`:::
-    (Optional, array) An array of strings that defines the fields that will be 
-    excluded from the analysis. You do not need to add fields with unsupported 
-    data types to `excludes`, these fields are excluded from the analysis 
-    automatically.
+(Optional, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=analyzed-fields]
+
+[source,console]
+--------------------------------------------------
+PUT _ml/data_frame/analytics/loganalytics
+{
+  "source": {
+    "index": "logdata"
+  },
+  "dest": {
+    "index": "logdata_out"
+  },
+  "analysis": {
+    "outlier_detection": {
+    }
+  },
+  "analyzed_fields": {
+        "includes": [ "request.bytes", "response.counts.error" ],
+        "excludes": [ "source.geo" ]
+  }
+}
+--------------------------------------------------
+// TEST[setup:setup_logdata]
+
 
 `description`::
-  (Optional, string) A description of the job.
+(Optional, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=description-dfa]
 
 `dest`::
-  (Required, object) The destination configuration, consisting of `index` and 
-  optionally `results_field` (`ml` by default).
-  
-    `index`:::
-      (Required, string) Defines the _destination index_ to store the results of 
-      the {dfanalytics-job}.
-    
-    `results_field`:::
-      (Optional, string) Defines the name of the field in which to store the 
-      results of the analysis. Default to `ml`.
+(Required, object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=dest]
   
 `model_memory_limit`::
-  (Optional, string) The approximate maximum amount of memory resources that are 
-  permitted for analytical processing. The default value for {dfanalytics-jobs} 
-  is `1gb`. If your `elasticsearch.yml` file contains an 
-  `xpack.ml.max_model_memory_limit` setting, an error occurs when you try to 
-  create {dfanalytics-jobs} that have `model_memory_limit` values greater than 
-  that setting. For more information, see <<ml-settings>>.
+(Optional, string)
+include::{docdir}/ml/ml-shared.asciidoc[tag=model-memory-limit-dfa]
   
 `source`::
-  (object) The configuration of how to source the analysis data. It requires an 
-  `index`. Optionally, `query` and `_source` may be specified.
-  
-  `index`:::
-    (Required, string or array) Index or indices on which to perform the 
-    analysis. It can be a single index or index pattern as well as an array of 
-    indices or patterns.
-    
-  `query`:::
-    (Optional, object) The {es} query domain-specific language 
-    (<<query-dsl,DSL>>). This value corresponds to the query object in an {es} 
-    search POST body. All the options that are supported by {es} can be used, 
-    as this object is passed verbatim to {es}. By default, this property has 
-    the following value: `{"match_all": {}}`.
-
-  `_source`:::
-    (Optional, object) Specify `includes` and/or `excludes` patterns to select
-    which fields will be present in the destination. Fields that are excluded
-    cannot be included in the analysis.
-        
-      `includes`::::
-        (array) An array of strings that defines the fields that will be 
-        included in the destination.
-          
-      `excludes`::::
-        (array) An array of strings that defines the fields that will be 
-        excluded from the destination.
+(object)
+include::{docdir}/ml/ml-shared.asciidoc[tag=source-put-dfa]
 
 `allow_lazy_start`::
-  (Optional, boolean) Whether this job should be allowed to start when there
-  is insufficient {ml} node capacity for it to be immediately assigned to a node.
-  The default is `false`, which means that the <<start-dfanalytics>>
-  will return an error if a {ml} node with capacity to run the
-  job cannot immediately be found. (However, this is also subject to
-  the cluster-wide `xpack.ml.max_lazy_ml_nodes` setting - see
-  <<advanced-ml-settings>>.) If this option is set to `true` then
-  the <<start-dfanalytics>> will not return an error, and the job will
-  wait in the `starting` state until sufficient {ml} node capacity
-  is available.
+(Optional, boolean) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=allow-lazy-start]
 
 
 [[ml-put-dfanalytics-example]]
@@ -405,9 +369,10 @@ PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
 --------------------------------------------------
 // TEST[skip:TBD]
 
-<1> The `training_percent` defines the percentage of the data set that will be used 
-for training the model.
-<2> The `randomize_seed` is the seed used to randomly pick which data is used for training.
+<1> The `training_percent` defines the percentage of the data set that will be 
+used for training the model.
+<2> The `randomize_seed` is the seed used to randomly pick which data is used 
+for training.
 
 
 [[ml-put-dfanalytics-example-c]]

+ 2 - 2
docs/reference/ml/df-analytics/apis/start-dfanalytics.asciidoc

@@ -36,8 +36,8 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
 ==== {api-query-parms-title}
 
 `timeout`::
-  (Optional, time) Controls the amount of time to wait until the
-  {dfanalytics-job} starts. The default value is 20 seconds.
+(Optional, <<time-units,time units>>) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=timeout-start]
 
 [[ml-start-dfanalytics-example]]
 ==== {api-examples-title}

+ 5 - 5
docs/reference/ml/df-analytics/apis/stop-dfanalytics.asciidoc

@@ -49,16 +49,16 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=job-id-data-frame-analytics-define]
 ==== {api-query-parms-title}  
   
 `allow_no_match`::
-  (Optional, boolean) If `false` and the `data_frame_analytics_id` does not
-  match any {dfanalytics-job} an error will be returned. The default value is
-  `true`.
+(Optional, boolean) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=allow-no-match]
+
 
 `force`::
   (Optional, boolean) If true, the {dfanalytics-job} is stopped forcefully.
   
 `timeout`::
-  (Optional, time) Controls the amount of time to wait until the
-  {dfanalytics-job} stops. The default value is 20 seconds.
+(Optional, <<time-units,time units>>) 
+include::{docdir}/ml/ml-shared.asciidoc[tag=timeout-stop]
 
 
 [[ml-stop-dfanalytics-example]]

+ 338 - 54
docs/reference/ml/ml-shared.asciidoc

@@ -8,7 +8,8 @@ end::aggregations[]
 tag::allow-lazy-open[]
 Advanced configuration option. Specifies whether this job can open when there is 
 insufficient {ml} node capacity for it to be immediately assigned to a node. The
-default value is `false`; if a {ml} node with capacity to run the job cannot immediately be found, the <<ml-open-job,open {anomaly-jobs} API>> returns an
+default value is `false`; if a {ml} node with capacity to run the job cannot 
+immediately be found, the <<ml-open-job,open {anomaly-jobs} API>> returns an 
 error. However, this is also subject to the cluster-wide 
 `xpack.ml.max_lazy_ml_nodes` setting; see <<advanced-ml-settings>>. If this
 option is set to `true`, the <<ml-open-job,open {anomaly-jobs} API>> does not
@@ -16,6 +17,18 @@ return an error and the job waits in the `opening` state until sufficient {ml}
 node capacity is available.
 end::allow-lazy-open[]
 
+
+tag::allow-lazy-start[]
+Whether this job should be allowed to start when there is insufficient {ml} node 
+capacity for it to be immediately assigned to a node. The default is `false`, 
+which means that the <<start-dfanalytics>> will return an error if a {ml} node 
+with capacity to run the job cannot immediately be found. (However, this is also 
+subject to the cluster-wide `xpack.ml.max_lazy_ml_nodes` setting - see 
+<<advanced-ml-settings>>.) If this option is set to `true` then the 
+<<start-dfanalytics>> will not return an error, and the job will wait in the 
+`starting` state until sufficient {ml} node capacity is available.
+end::allow-lazy-start[]
+
 tag::allow-no-datafeeds[]
 Specifies what to do when the request:
 +
@@ -61,10 +74,16 @@ when there are no matches or only partial matches.
 --
 end::allow-no-match[]
 
+tag::analysis[]
+Defines the type of {dfanalytics} you want to perform on your source index. For 
+example: `outlier_detection`. See <<ml-dfa-analysis-objects>>.
+end::analysis[]
+
 tag::analysis-config[]
 The analysis configuration, which specifies how to analyze the data.
 After you create a job, you cannot change the analysis configuration; all
-the properties are informational. An analysis configuration object has the following properties:
+the properties are informational. An analysis configuration object has the 
+following properties:
 
 `bucket_span`:::
 (<<time-units,time units>>)
@@ -128,6 +147,25 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=categorization-examples-limit]
 include::{docdir}/ml/ml-shared.asciidoc[tag=model-memory-limit]
 end::analysis-limits[]
 
+tag::analyzed-fields[]
+Specify `includes` and/or `excludes` patterns to select which fields will be 
+included in the analysis. If `analyzed_fields` is not set, only the relevant 
+fields will be included. For example, all the numeric fields for {oldetection}. 
+For the supported field types, see <<ml-put-dfanalytics-supported-fields>>. Also 
+see the <<explain-dfanalytics>> which helps understand field selection.
+
+`includes`:::
+  (Optional, array) An array of strings that defines the fields that will be 
+  included in the analysis.
+  
+`excludes`:::
+  (Optional, array) An array of strings that defines the fields that will be 
+  excluded from the analysis. You do not need to add fields with unsupported 
+  data types to `excludes`, these fields are excluded from the analysis 
+  automatically.
+end::analyzed-fields[]
+
+
 tag::background-persist-interval[]
 Advanced configuration option. The time between each periodic persistence of the 
 model. The default value is a randomized value between 3 to 4 hours, which
@@ -208,7 +246,9 @@ tag::categorization-filters[]
 If `categorization_field_name` is specified, you can also define optional
 filters. This property expects an array of regular expressions. The expressions
 are used to filter out matching sequences from the categorization field values.
-You can use this functionality to fine tune the categorization by excluding sequences from consideration when categories are defined. For example, you can exclude SQL statements that appear in your log files. For more information, see
+You can use this functionality to fine tune the categorization by excluding 
+sequences from consideration when categories are defined. For example, you can 
+exclude SQL statements that appear in your log files. For more information, see 
 {stack-ov}/ml-configuring-categories.html[Categorizing log messages]. This
 property cannot be used at the same time as `categorization_analyzer`. If you
 only want to define simple regular expression filters that are applied prior to 
@@ -229,6 +269,11 @@ add them here as
 <<analysis-pattern-replace-charfilter,pattern replace character filters>>.
 end::char-filter[]
 
+
+tag::compute-feature-influence[]
+If `true`, the feature influence calculation is enabled. Defaults to `true`.
+end::compute-feature-influence[]
+
 tag::chunking-config[]
 {dfeeds-cap} might be required to search over long time periods, for several months
 or years. This search is split into time chunks in order to ensure the load
@@ -280,7 +325,8 @@ to an object with the following properties:
 
 `filter_type`:::
 (string) Either `include` (the rule applies for values in the filter) or
-`exclude` (the rule applies for values not in the filter). Defaults to `include`.
+`exclude` (the rule applies for values not in the filter). Defaults to 
+`include`.
 
 `conditions`::
 (array) An optional array of numeric conditions when the rule applies. A rule
@@ -339,6 +385,92 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=time-format]
 --
 end::data-description[]
 
+tag::data-frame-analytics[]
+An array of {dfanalytics-job} resources, which are sorted by the `id` value in 
+ascending order.
+
+`id`:::
+(string) The unique identifier of the {dfanalytics-job}.
+
+`source`:::
+(object) The configuration of how the analysis data is sourced. It has an 
+`index` parameter and optionally a `query` and a `_source`.
+  
+`index`::::
+(array) Index or indices on which to perform the analysis. It can be a single 
+index or index pattern as well as an array of indices or patterns.
+    
+`query`::::
+(object) The query that has been specified for the {dfanalytics-job}. The {es} 
+query domain-specific language (<<query-dsl,DSL>>). This value corresponds to 
+the query object in an {es} search POST body. By default, this property has the 
+following value: `{"match_all": {}}`.
+
+`_source`::::
+(object) Contains the specified `includes` and/or `excludes` patterns that 
+select which fields are present in the destination. Fields that are excluded 
+cannot be included in the analysis.
+        
+`includes`:::::
+(array) An array of strings that defines the fields that are included in the 
+destination.
+          
+`excludes`:::::
+(array) An array of strings that defines the fields that are excluded from the 
+destination.
+
+`dest`:::
+(string) The destination configuration of the analysis.
+
+`index`::::
+(string) The _destination index_ that stores the results of the 
+{dfanalytics-job}.
+
+`results_field`::::
+(string) The name of the field that stores the results of the analysis. Defaults 
+to `ml`.
+
+`analysis`:::
+(object) The type of analysis that is performed on the `source`.
+
+`analyzed_fields`:::
+(object) Contains `includes` and/or `excludes` patterns that select which fields 
+are included in the analysis.
+    
+`includes`::::
+(Optional, array) An array of strings that defines the fields that are included 
+in the analysis.
+      
+`excludes`::::
+(Optional, array) An array of strings that defines the fields that are excluded 
+from the analysis.
+
+`model_memory_limit`:::
+(string) The `model_memory_limit` that has been set to the {dfanalytics-job}.
+end::data-frame-analytics[]
+
+tag::data-frame-analytics-stats[]
+An array of statistics objects for {dfanalytics-jobs}, which are
+sorted by the `id` value in ascending order.
+
+`id`:::
+(string) The unique identifier of the {dfanalytics-job}.
+  
+`state`:::
+(string) Current state of the {dfanalytics-job}.
+  
+`progress`:::
+(array) The progress report of the {dfanalytics-job} by phase.
+  
+`phase`:::
+(string) Defines the phase of the {dfanalytics-job}. Possible phases: 
+`reindexing`, `loading_data`, `analyzing`, and `writing_results`.
+    
+`progress_percent`:::
+(integer) The progress that the {dfanalytics-job} has made expressed in 
+percentage.
+end::data-frame-analytics-stats[]
+
 tag::datafeed-id[]
 A numerical character string that uniquely identifies the
 {dfeed}. This identifier can contain lowercase alphanumeric characters (a-z
@@ -380,14 +512,30 @@ calculation is based on the maximum of `2h` or `8 * bucket_span`.
 --
 end::delayed-data-check-config[]
 
-tag::dependent_variable[]
-`dependent_variable`::
-(Required, string) Defines which field of the document is to be predicted. 
+tag::dependent-variable[]
+Defines which field of the document is to be predicted.
 This parameter is supplied by field name and must match one of the fields in 
 the index being used to train. If this field is missing from a document, then 
 that document will not be used for training, but a prediction with the trained 
 model will be generated for it. It is also known as continuous target variable.
-end::dependent_variable[]
+end::dependent-variable[]
+
+tag::description-dfa[]
+A description of the job.
+end::description-dfa[]
+
+tag::dest[]
+The destination configuration, consisting of `index` and 
+optionally `results_field` (`ml` by default).
+
+  `index`:::
+    (Required, string) Defines the _destination index_ to store the results of 
+    the {dfanalytics-job}.
+  
+  `results_field`:::
+    (Optional, string) Defines the name of the field in which to store the 
+    results of the analysis. Default to `ml`.
+end::dest[]
 
 tag::detector-description[]
 A description of the detector. For example, `Low event rate`.
@@ -455,8 +603,7 @@ include::{docdir}/ml/ml-shared.asciidoc[tag=use-null]
 end::detector[]
 
 tag::eta[]
-`eta`::
-(Optional, double) The shrinkage applied to the weights. Smaller values result 
+The shrinkage applied to the weights. Smaller values result 
 in larger forests which have better generalization error. However, the smaller 
 the value the longer the training will take. For more information, see 
 https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article] 
@@ -471,11 +618,39 @@ working with both over and by fields, then you can set `exclude_frequent` to
 `all` for both fields, or to `by` or `over` for those specific fields.
 end::exclude-frequent[]
 
-tag::feature_bag_fraction[]
-`feature_bag_fraction`::
-(Optional, double) Defines the fraction of features that will be used when 
+tag::feature-bag-fraction[]
+Defines the fraction of features that will be used when 
 selecting a random bag for each candidate split. 
-end::feature_bag_fraction[]
+end::feature-bag-fraction[]
+
+tag::feature-influence-threshold[]
+The minimum {olscore} that a document needs to have in order to calculate its 
+{fiscore}. Value range: 0-1 (`0.1` by default).
+end::feature-influence-threshold[]
+
+tag::field-selection[]
+An array of objects that explain selection for each field, sorted by 
+the field names. Each object in the array has the following properties:
+
+`name`:::
+(string) The field name.
+
+`mapping_types`:::
+(string) The mapping types of the field.
+
+`is_included`:::
+(boolean) Whether the field is selected to be included in the analysis.
+
+`is_required`:::
+(boolean) Whether the field is required.
+
+`feature_type`:::
+(string) The feature type of this field for the analysis. May be `categorical` 
+or `numerical`.
+
+`reason`:::
+(string) The reason a field is not selected to be included in the analysis.
+end::field-selection[]
 
 tag::filter[]
 One or more <<analysis-tokenfilters,token filters>>. In addition to the built-in 
@@ -491,6 +666,10 @@ for longer bucket spans, a sensible fraction of the bucket span. For example:
 `150s`.
 end::frequency[]
 
+tag::from[]
+Skips the specified number of {dfanalytics-jobs}. The default value is `0`.
+end::from[]
+
 tag::function[]
 The analysis function that is used. For example, `count`, `rare`, `mean`, `min`, 
 `max`, and `sum`. For more information, see
@@ -498,8 +677,7 @@ The analysis function that is used. For example, `count`, `rare`, `mean`, `min`,
 end::function[]
 
 tag::gamma[]
-`gamma`::
-(Optional, double) Regularization parameter to prevent overfitting on the 
+Regularization parameter to prevent overfitting on the 
 training dataset. Multiplies a linear penalty associated with the size of 
 individual trees in the forest. The higher the value the more training will 
 prefer smaller trees. The smaller this parameter the larger individual trees 
@@ -538,7 +716,9 @@ Identifier for the {dfanalytics-job}.
 end::job-id-data-frame-analytics[]
 
 tag::job-id-anomaly-detection-default[]
-Identifier for the {anomaly-job}. It can be a job identifier, a group name, or a wildcard expression. If you do not specify one of these options, the API returns information for all {anomaly-jobs}.
+Identifier for the {anomaly-job}. It can be a job identifier, a group name, or a 
+wildcard expression. If you do not specify one of these options, the API returns 
+information for all {anomaly-jobs}.
 end::job-id-anomaly-detection-default[]
 
 tag::job-id-data-frame-analytics-default[]
@@ -552,7 +732,8 @@ identifier, a group name, or a comma-separated list of jobs or groups.
 end::job-id-anomaly-detection-list[]
 
 tag::job-id-anomaly-detection-wildcard[]
-Identifier for the {anomaly-job}. It can be a job identifier, a group name, or a wildcard expression.
+Identifier for the {anomaly-job}. It can be a job identifier, a group name, or a 
+wildcard expression.
 end::job-id-anomaly-detection-wildcard[]
 
 tag::job-id-anomaly-detection-wildcard-list[]
@@ -561,9 +742,9 @@ comma-separated list of jobs or groups, or a wildcard expression.
 end::job-id-anomaly-detection-wildcard-list[]
 
 tag::job-id-anomaly-detection-define[]
-Identifier for the {anomaly-job}. This identifier can contain lowercase alphanumeric
-characters (a-z and 0-9), hyphens, and underscores. It must start and end with
-alphanumeric characters.
+Identifier for the {anomaly-job}. This identifier can contain lowercase 
+alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start 
+and end with alphanumeric characters.
 end::job-id-anomaly-detection-define[]
 
 tag::job-id-data-frame-analytics-define[]
@@ -577,8 +758,7 @@ The unique identifier for the job to which the {dfeed} sends data.
 end::job-id-datafeed[]
 
 tag::lambda[]
-`lambda`::
-(Optional, double) Regularization parameter to prevent overfitting on the 
+Regularization parameter to prevent overfitting on the 
 training dataset. Multiplies an L2 regularisation term which applies to leaf 
 weights of the individual trees in the forest. The higher the value the more 
 training will attempt to keep leaf weights small. This makes the prediction  
@@ -589,7 +769,9 @@ end::lambda[]
 
 tag::latency[]
 The size of the window in which to expect data that is out of time order. The 
-default value is 0 (no latency). If you specify a non-zero value, it must be greater than or equal to one second. For more information about time units, see <<time-units>>.
+default value is 0 (no latency). If you specify a non-zero value, it must be 
+greater than or equal to one second. For more information about time units, see 
+<<time-units>>.
 +
 --
 NOTE: Latency is only applicable when you send data by using
@@ -607,11 +789,33 @@ If not set then a {dfeed} with no end time that sees no data will remain started
 until it is explicitly stopped. By default this setting is not set.
 end::max-empty-searches[]
 
-tag::maximum_number_trees[]
-`maximum_number_trees`::
-(Optional, integer) Defines the maximum number of trees the forest is allowed 
+tag::maximum-number-trees[]
+Defines the maximum number of trees the forest is allowed 
 to contain. The maximum value is 2000.
-end::maximum_number_trees[]
+end::maximum-number-trees[]
+
+tag::memory-estimation[]
+An object containing the memory estimates. The object has the 
+following properties:
+
+`expected_memory_without_disk`:::
+(string) Estimated memory usage under the assumption that the whole 
+{dfanalytics} should happen in memory (i.e. without overflowing to disk).
+
+`expected_memory_with_disk`:::
+(string) Estimated memory usage under the assumption that overflowing to disk is 
+allowed during {dfanalytics}. `expected_memory_with_disk` is usually smaller 
+than `expected_memory_without_disk` as using disk allows to limit the main 
+memory needed to perform {dfanalytics}.
+end::memory-estimation[]
+
+tag::method[]
+Sets the method that {oldetection} uses. If the method is not set {oldetection} 
+uses an ensemble of different methods and normalises and combines their 
+individual {olscores} to obtain the overall {olscore}. We recommend to use the 
+ensemble method. Available methods are `lof`, `ldof`, `distance_kth_nn`, 
+`distance_knn`.
+end::method[]
 
 tag::mode[]
 There are three available modes: 
@@ -649,20 +853,30 @@ see <<ml-settings>>.
 --
 end::model-memory-limit[]
 
+tag::model-memory-limit-dfa[]
+The approximate maximum amount of memory resources that are permitted for 
+analytical processing. The default value for {dfanalytics-jobs} is `1gb`. If 
+your `elasticsearch.yml` file contains an `xpack.ml.max_model_memory_limit` 
+setting, an error occurs when you try to create {dfanalytics-jobs} that have 
+`model_memory_limit` values greater than that setting. For more information, see 
+<<ml-settings>>.
+end::model-memory-limit-dfa[]
+
 tag::model-plot-config[]
 This advanced configuration option stores model information along with the
 results. It provides a more detailed view into {anomaly-detect}.
 +
 --
-WARNING: If you enable model plot it can add considerable overhead to the performance
-of the system; it is not feasible for jobs with many entities.
+WARNING: If you enable model plot it can add considerable overhead to the 
+performance of the system; it is not feasible for jobs with many entities.
 
-Model plot provides a simplified and indicative view of the model and its bounds.
-It does not display complex features such as multivariate correlations or multimodal data.
-As such, anomalies may occasionally be reported which cannot be seen in the model plot.
+Model plot provides a simplified and indicative view of the model and its 
+bounds. It does not display complex features such as multivariate correlations 
+or multimodal data. As such, anomalies may occasionally be reported which cannot 
+be seen in the model plot.
 
-Model plot config can be configured when the job is created or updated later. It must be
-disabled if performance issues are experienced.
+Model plot config can be configured when the job is created or updated later. It 
+must be disabled if performance issues are experienced.
 
 The `model_plot_config` object has the following properties:
 
@@ -713,6 +927,21 @@ NOTE: To use the `multivariate_by_fields` property, you must also specify
 --
 end::multivariate-by-fields[]
 
+tag::n-neighbors[]
+Defines the value for how many nearest neighbors each method of 
+{oldetection} will use to calculate its {olscore}. When the value is not set, 
+different values will be used for different ensemble members. This helps 
+improve diversity in the ensemble. Therefore, only override this if you are 
+confident that the value you choose is appropriate for the data set.
+end::n-neighbors[]
+
+tag::num-top-classes[]
+Defines the number of categories for which the predicted 
+probabilities are reported. It must be non-negative. If it is greater than the 
+total number of categories (in the {version} version of the {stack}, it's two) 
+to predict then we will report all category probabilities. Defaults to 2.
+end::num-top-classes[]
+
 tag::over-field-name[]
 The field used to split the data. In particular, this property is used for 
 analyzing the splits with respect to the history of all splits. It is used for 
@@ -720,16 +949,29 @@ finding unusual values in the population of all splits. For more information,
 see {stack-ov}/ml-configuring-pop.html[Performing population analysis].
 end::over-field-name[]
 
+tag::outlier-fraction[]
+Sets the proportion of the data set that is assumed to be outlying prior to 
+{oldetection}. For example, 0.05 means it is assumed that 5% of values are real 
+outliers and 95% are inliers.
+end::outlier-fraction[]
+
 tag::partition-field-name[]
 The field used to segment the analysis. When you use this property, you have 
 completely independent baselines for each value of this field.
 end::partition-field-name[]
 
-tag::prediction_field_name[]
-`prediction_field_name`::
-(Optional, string) Defines the name of the prediction field in the results. 
+tag::prediction-field-name[]
+Defines the name of the prediction field in the results. 
 Defaults to `<dependent_variable>_prediction`.
-end::prediction_field_name[]
+end::prediction-field-name[]
+
+tag::randomize-seed[]
+Defines the seed to the random generator that is used to pick which documents 
+will be used for training. By default it is randomly generated. Set it to a 
+specific value to ensure the same documents are used for training assuming other 
+related parameters (for example, `source`, `analyzed_fields`, etc.) are the 
+same.
+end::randomize-seed[]
 
 tag::query[]
 The {es} query domain-specific language (DSL). This value corresponds to the
@@ -776,6 +1018,48 @@ tag::scroll-size[]
 The `size` parameter that is used in {es} searches. The default value is `1000`.
 end::scroll-size[]
 
+tag::size[]
+Specifies the maximum number of {dfanalytics-jobs} to obtain. The default value 
+is `100`.
+end::size[]
+
+tag::source-put-dfa[]
+The configuration of how to source the analysis data. It requires an 
+`index`. Optionally, `query` and `_source` may be specified.
+
+`index`:::
+  (Required, string or array) Index or indices on which to perform the 
+  analysis. It can be a single index or index pattern as well as an array of 
+  indices or patterns.
+  
+`query`:::
+  (Optional, object) The {es} query domain-specific language 
+  (<<query-dsl,DSL>>). This value corresponds to the query object in an {es} 
+  search POST body. All the options that are supported by {es} can be used, 
+  as this object is passed verbatim to {es}. By default, this property has 
+  the following value: `{"match_all": {}}`.
+
+`_source`:::
+  (Optional, object) Specify `includes` and/or `excludes` patterns to select
+  which fields will be present in the destination. Fields that are excluded
+  cannot be included in the analysis.
+      
+    `includes`::::
+      (array) An array of strings that defines the fields that will be 
+      included in the destination.
+        
+    `excludes`::::
+      (array) An array of strings that defines the fields that will be 
+      excluded from the destination.
+end::source-put-dfa[]
+
+tag::standardization-enabled[]
+If `true`, then the following operation is performed on the columns before 
+computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to `true`. For 
+more information, see 
+https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].
+end::standardization-enabled[]
+
 tag::summary-count-field-name[]
 If this property is specified, the data that is fed to the job is expected to be
 pre-summarized. This property value is the name of the field that contains the 
@@ -789,6 +1073,16 @@ function.
 --
 end::summary-count-field-name[]
 
+tag::timeout-start[]
+Controls the amount of time to wait until the {dfanalytics-job} starts. Defaults 
+to 20 seconds.
+end::timeout-start[]
+
+tag::timeout-stop[]
+Controls the amount of time to wait until the {dfanalytics-job} stops. Defaults 
+to 20 seconds.
+end::timeout-stop[]
+
 tag::time-format[]
 The time format, which can be `epoch`, `epoch_ms`, or a custom pattern. The
 default value is `epoch`, which refers to UNIX or Epoch time (the number of 
@@ -800,8 +1094,8 @@ either integer or real values. +
 NOTE: Custom patterns must conform to the Java `DateTimeFormatter` class.
 When you use date-time formatting patterns, it is recommended that you provide
 the full date, time and time zone. For example: `yyyy-MM-dd'T'HH:mm:ssX`.
-If the pattern that you specify is not sufficient to produce a complete timestamp,
-job creation fails.
+If the pattern that you specify is not sufficient to produce a complete 
+timestamp, job creation fails.
 
 --
 end::time-format[]
@@ -821,22 +1115,12 @@ that tokenizer but change the character or token filters, specify
 `"tokenizer": "ml_classic"` in your `categorization_analyzer`.
 end::tokenizer[]
 
-tag::training_percent[]
-`training_percent`::
-(Optional, integer) Defines what percentage of the eligible documents that will 
+tag::training-percent[]
+Defines what percentage of the eligible documents that will 
 be used for training. Documents that are ignored by the analysis (for example 
 those that contain arrays) won’t be included in the calculation for used 
 percentage. Defaults to `100`.
-end::training_percent[]
-
-tag::randomize_seed[]
-`randomize_seed`::
-(Optional, long) Defines the seed to the random generator that is used to pick
-which documents will be used for training. By default it is randomly generated.
-Set it to a specific value to ensure the same documents are used for training
-assuming other related parameters (e.g. `source`, `analyzed_fields`, etc.) are the same.
-end::randomize_seed[]
-
+end::training-percent[]
 
 tag::use-null[]
 Defines whether a new series is used as the null series when there is no value 

+ 4 - 4
docs/reference/rest-api/defs.asciidoc

@@ -5,15 +5,15 @@
 These resource definitions are used in APIs related to {ml-features} and
 {security-features} and in {kib} advanced {ml} job configuration options.
 
-* <<ml-dfanalytics-resources,{dfanalytics-cap}>>
-* <<ml-evaluate-dfanalytics-resources,Evaluate {dfanalytics}>>
+
+* <<ml-dfa-analysis-objects>>
 * <<ml-snapshot-resource,{anomaly-detect-cap} model snapshots>>
 * <<ml-results-resource,{anomaly-detect-cap} results>>
 * <<role-mapping-resources,Role mappings>>
 * <<transform-resource,{transforms-cap}>>
 
-include::{es-repo-dir}/ml/df-analytics/apis/dfanalyticsresources.asciidoc[]
-include::{es-repo-dir}/ml/df-analytics/apis/evaluateresources.asciidoc[]
+
+include::{es-repo-dir}/ml/df-analytics/apis/analysisobjects.asciidoc[]
 include::{es-repo-dir}/ml/anomaly-detection/apis/snapshotresource.asciidoc[]
 include::{xes-repo-dir}/rest-api/security/role-mapping-resources.asciidoc[]
 include::{es-repo-dir}/ml/anomaly-detection/apis/resultsresource.asciidoc[]