analysisobjects.asciidoc 6.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[ml-dfa-analysis-objects]]
  4. === Analysis configuration objects
  5. {dfanalytics-cap} resources contain `analysis` objects. For example, when you
  6. create a {dfanalytics-job}, you must define the type of analysis it performs.
  7. This page lists all the available parameters that you can use in the `analysis`
  8. object grouped by {dfanalytics} types.
  9. [discrete]
  10. [[oldetection-resources]]
  11. ==== {oldetection-cap} configuration objects
  12. An `outlier_detection` configuration object has the following properties:
  13. `compute_feature_influence`::
  14. (Optional, boolean)
  15. include::{docdir}/ml/ml-shared.asciidoc[tag=compute-feature-influence]
  16. `feature_influence_threshold`::
  17. (Optional, double)
  18. include::{docdir}/ml/ml-shared.asciidoc[tag=feature-influence-threshold]
  19. `method`::
  20. (Optional, string)
  21. include::{docdir}/ml/ml-shared.asciidoc[tag=method]
  22. `n_neighbors`::
  23. (Optional, integer)
  24. include::{docdir}/ml/ml-shared.asciidoc[tag=n-neighbors]
  25. `outlier_fraction`::
  26. (Optional, double)
  27. include::{docdir}/ml/ml-shared.asciidoc[tag=outlier-fraction]
  28. `standardization_enabled`::
  29. (Optional, boolean)
  30. include::{docdir}/ml/ml-shared.asciidoc[tag=standardization-enabled]
  31. [discrete]
  32. [[reganalysis-resources]]
  33. ==== {regression-cap} configuration objects
  34. [source,console]
  35. --------------------------------------------------
  36. PUT _ml/data_frame/analytics/house_price_regression_analysis
  37. {
  38. "source": {
  39. "index": "houses_sold_last_10_yrs" <1>
  40. },
  41. "dest": {
  42. "index": "house_price_predictions" <2>
  43. },
  44. "analysis":
  45. {
  46. "regression": { <3>
  47. "dependent_variable": "price" <4>
  48. }
  49. }
  50. }
  51. --------------------------------------------------
  52. // TEST[skip:TBD]
  53. <1> Training data is taken from source index `houses_sold_last_10_yrs`.
  54. <2> Analysis results will be output to destination index
  55. `house_price_predictions`.
  56. <3> The regression analysis configuration object.
  57. <4> Regression analysis will use field `price` to train on. As no other
  58. parameters have been specified it will train on 100% of eligible data, store its
  59. prediction in destination index field `price_prediction` and use in-built
  60. hyperparameter optimization to give minimum validation errors.
  61. [float]
  62. [[regression-resources-standard]]
  63. ===== Standard parameters
  64. `dependent_variable`::
  65. (Required, string)
  66. include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
  67. +
  68. --
  69. The data type of the field must be numeric.
  70. --
  71. `prediction_field_name`::
  72. (Optional, string)
  73. include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
  74. `training_percent`::
  75. (Optional, integer)
  76. include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
  77. `randomize_seed`::
  78. (Optional, long)
  79. include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
  80. [float]
  81. [[regression-resources-advanced]]
  82. ===== Advanced parameters
  83. Advanced parameters are for fine-tuning {reganalysis}. They are set
  84. automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
  85. to give minimum validation error. It is highly recommended to use the default
  86. values unless you fully understand the function of these parameters. If these
  87. parameters are not supplied, their values are automatically tuned to give
  88. minimum validation error.
  89. `eta`::
  90. (Optional, double)
  91. include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
  92. `feature_bag_fraction`::
  93. (Optional, double)
  94. include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
  95. `maximum_number_trees`::
  96. (Optional, integer)
  97. include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
  98. `gamma`::
  99. (Optional, double)
  100. include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
  101. `lambda`::
  102. (Optional, double)
  103. include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
  104. [discrete]
  105. [[classanalysis-resources]]
  106. ==== {classification-cap} configuration objects
  107. [float]
  108. [[classification-resources-standard]]
  109. ===== Standard parameters
  110. `dependent_variable`::
  111. (Required, string)
  112. include::{docdir}/ml/ml-shared.asciidoc[tag=dependent-variable]
  113. +
  114. --
  115. The data type of the field must be numeric (`integer`, `short`, `long`, `byte`),
  116. categorical (`ip`, `keyword`, `text`), or boolean.
  117. --
  118. `num_top_classes`::
  119. (Optional, integer)
  120. include::{docdir}/ml/ml-shared.asciidoc[tag=num-top-classes]
  121. `prediction_field_name`::
  122. (Optional, string)
  123. include::{docdir}/ml/ml-shared.asciidoc[tag=prediction-field-name]
  124. `training_percent`::
  125. (Optional, integer)
  126. include::{docdir}/ml/ml-shared.asciidoc[tag=training-percent]
  127. `randomize_seed`::
  128. (Optional, long)
  129. include::{docdir}/ml/ml-shared.asciidoc[tag=randomize-seed]
  130. [float]
  131. [[classification-resources-advanced]]
  132. ===== Advanced parameters
  133. Advanced parameters are for fine-tuning {classanalysis}. They are set
  134. automatically by <<ml-hyperparam-optimization,hyperparameter optimization>>
  135. to give minimum validation error. It is highly recommended to use the default
  136. values unless you fully understand the function of these parameters. If these
  137. parameters are not supplied, their values are automatically tuned to give
  138. minimum validation error.
  139. `eta`::
  140. (Optional, double)
  141. include::{docdir}/ml/ml-shared.asciidoc[tag=eta]
  142. `feature_bag_fraction`::
  143. (Optional, double)
  144. include::{docdir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
  145. `maximum_number_trees`::
  146. (Optional, integer)
  147. include::{docdir}/ml/ml-shared.asciidoc[tag=maximum-number-trees]
  148. `gamma`::
  149. (Optional, double)
  150. include::{docdir}/ml/ml-shared.asciidoc[tag=gamma]
  151. `lambda`::
  152. (Optional, double)
  153. include::{docdir}/ml/ml-shared.asciidoc[tag=lambda]
  154. [discrete]
  155. [[ml-hyperparam-optimization]]
  156. ==== Hyperparameter optimization
  157. If you don't supply {regression} or {classification} parameters, hyperparameter
  158. optimization will be performed by default to set a value for the undefined
  159. parameters. The starting point is calculated for data dependent parameters by
  160. examining the loss on the training data. Subject to the size constraint, this
  161. operation provides an upper bound on the improvement in validation loss.
  162. A fixed number of rounds is used for optimization which depends on the number of
  163. parameters being optimized. The optimization starts with random search, then
  164. Bayesian optimization is performed that is targeting maximum expected
  165. improvement. If you override any parameters, then the optimization will
  166. calculate the value of the remaining parameters accordingly and use the value
  167. you provided for the overridden parameter. The number of rounds are reduced
  168. respectively. The validation error is estimated in each round by using 4-fold
  169. cross validation.