ml-shared.asciidoc 2.7 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
  1. tag::dependent_variable[]
  2. `dependent_variable`::
  3. (Required, string) Defines which field of the document is to be predicted.
  4. This parameter is supplied by field name and must match one of the fields in
  5. the index being used to train. If this field is missing from a document, then
  6. that document will not be used for training, but a prediction with the trained
  7. model will be generated for it. It is also known as continuous target variable.
  8. end::dependent_variable[]
  9. tag::eta[]
  10. `eta`::
  11. (Optional, double) The shrinkage applied to the weights. Smaller values result
  12. in larger forests which have better generalization error. However, the smaller
  13. the value the longer the training will take. For more information, see
  14. https://en.wikipedia.org/wiki/Gradient_boosting#Shrinkage[this wiki article]
  15. about shrinkage.
  16. end::eta[]
  17. tag::feature_bag_fraction[]
  18. `feature_bag_fraction`::
  19. (Optional, double) Defines the fraction of features that will be used when
  20. selecting a random bag for each candidate split.
  21. end::feature_bag_fraction[]
  22. tag::gamma[]
  23. `gamma`::
  24. (Optional, double) Regularization parameter to prevent overfitting on the
  25. training dataset. Multiplies a linear penalty associated with the size of
  26. individual trees in the forest. The higher the value the more training will
  27. prefer smaller trees. The smaller this parameter the larger individual trees
  28. will be and the longer train will take.
  29. end::gamma[]
  30. tag::lambda[]
  31. `lambda`::
  32. (Optional, double) Regularization parameter to prevent overfitting on the
  33. training dataset. Multiplies an L2 regularisation term which applies to leaf
  34. weights of the individual trees in the forest. The higher the value the more
  35. training will attempt to keep leaf weights small. This makes the prediction
  36. function smoother at the expense of potentially not being able to capture
  37. relevant relationships between the features and the {depvar}. The smaller this
  38. parameter the larger individual trees will be and the longer train will take.
  39. end::lambda[]
  40. tag::maximum_number_trees[]
  41. `maximum_number_trees`::
  42. (Optional, integer) Defines the maximum number of trees the forest is allowed
  43. to contain. The maximum value is 2000.
  44. end::maximum_number_trees[]
  45. tag::prediction_field_name[]
  46. `prediction_field_name`::
  47. (Optional, string) Defines the name of the prediction field in the results.
  48. Defaults to `<dependent_variable>_prediction`.
  49. end::prediction_field_name[]
  50. tag::training_percent[]
  51. `training_percent`::
  52. (Optional, integer) Defines what percentage of the eligible documents that will
  53. be used for training. Documents that are ignored by the analysis (for example
  54. those that contain arrays) won’t be included in the calculation for used
  55. percentage. Defaults to `100`.
  56. end::training_percent[]