123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216 |
- [role="xpack"]
- [testenv="platinum"]
- [[ml-dfa-analysis-objects]]
- === Analysis configuration objects
- {dfanalytics-cap} resources contain `analysis` objects. For example, when you
- create a {dfanalytics-job}, you must define the type of analysis it performs.
- This page lists all the available parameters that you can use in the `analysis`
- object grouped by {dfanalytics} types.
- [discrete]
- [[oldetection-resources]]
- ==== {oldetection-cap} configuration objects
- An `outlier_detection` configuration object has the following properties:
- `compute_feature_influence`::
- (Optional, boolean)
- include::{docdir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
-
- `feature_influence_threshold`::
- (Optional, double)
- include::{docdir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
- `method`::
- (Optional, string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=method]
-
- `n_neighbors`::
- (Optional, integer)
- include::{docdir}/ml/ml-shared.asciidoc[tag=n-neighbors]
-
- `outlier_fraction`::
- (Optional, double)
- include::{docdir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
-
- `standardization_enabled`::
- (Optional, boolean)
- include::{docdir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
- [discrete]
- [[reganalysis-resources]]
- ==== {regression-cap} configuration objects
- [source,console]
- --------------------------------------------------
- PUT _ml/data_frame/analytics/house_price_regression_analysis
- {
- "source": {
- "index": "houses_sold_last_10_yrs" <1>
- },
- "dest": {
- "index": "house_price_predictions" <2>
- },
- "analysis":
- {
- "regression": { <3>
- "dependent_variable": "price" <4>
- }
- }
- }
- --------------------------------------------------
- // TEST[skip:TBD]
- <1> Training data is taken from source index `houses_sold_last_10_yrs`.
- <2> Analysis results will be output to destination index
- `house_price_predictions`.
- <3> The regression analysis configuration object.
- <4> Regression analysis will use field `price` to train on. As no other
- parameters have been specified it will train on 100% of eligible data, store its
- prediction in destination index field `price_prediction` and use in-built
- hyperparameter optimization to give minimum validation errors.
- [float]
- [[regression-resources-standard]]
- ===== Standard parameters
- `dependent_variable`::
- (Required, string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
- +
- --
- The data type of the field must be numeric.
- --
- `prediction_field_name`::
- (Optional, string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
- `training_percent`::
- (Optional, integer)
- include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
- `randomize_seed`::
- (Optional, long)
- include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
- [float]
- [[regression-resources-advanced]]
- ===== Advanced parameters
- Advanced parameters are for fine-tuning {reganalysis}. They are set
- automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
- to give minimum validation error. It is highly recommended to use the default
- values unless you fully understand the function of these parameters. If these
- parameters are not supplied, their values are automatically tuned to give
- minimum validation error.
- `eta`::
- (Optional, double)
- include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
- `feature_bag_fraction`::
- (Optional, double)
- include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
- `maximum_number_trees`::
- (Optional, integer)
- include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
- `gamma`::
- (Optional, double)
- include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
- `lambda`::
- (Optional, double)
- include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
- [discrete]
- [[classanalysis-resources]]
- ==== {classification-cap} configuration objects
-
-
- [float]
- [[classification-resources-standard]]
- ===== Standard parameters
-
- `dependent_variable`::
- (Required, string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
- +
- --
- The data type of the field must be numeric or boolean.
- --
-
- `num_top_classes`::
- (Optional, integer)
- include::{docdir}/ml/ml-shared.asciidoc[tag=num-top-classes]
-
- `prediction_field_name`::
- (Optional, string)
- include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
- `training_percent`::
- (Optional, integer)
- include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
- `randomize_seed`::
- (Optional, long)
- include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
- [float]
- [[classification-resources-advanced]]
- ===== Advanced parameters
- Advanced parameters are for fine-tuning {classanalysis}. They are set
- automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
- to give minimum validation error. It is highly recommended to use the default
- values unless you fully understand the function of these parameters. If these
- parameters are not supplied, their values are automatically tuned to give
- minimum validation error.
- `eta`::
- (Optional, double)
- include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
- `feature_bag_fraction`::
- (Optional, double)
- include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
- `maximum_number_trees`::
- (Optional, integer)
- include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
- `gamma`::
- (Optional, double)
- include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
- `lambda`::
- (Optional, double)
- include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
- [discrete]
- [[ml-hyperparam-optimization]]
- ==== Hyperparameter optimization
- If you don't supply {regression} or {classification} parameters, hyperparameter
- optimization will be performed by default to set a value for the undefined
- parameters. The starting point is calculated for data dependent parameters by
- examining the loss on the training data. Subject to the size constraint, this
- operation provides an upper bound on the improvement in validation loss.
- A fixed number of rounds is used for optimization which depends on the number of
- parameters being optimized. The optimization starts with random search, then
- Bayesian optimization is performed that is targeting maximum expected
- improvement. If you override any parameters, then the optimization will
- calculate the value of the remaining parameters accordingly and use the value
- you provided for the overridden parameter. The number of rounds are reduced
- respectively. The validation error is estimated in each round by using 4-fold
- cross validation.
|