put-data-frame-analytics.asciidoc 8.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190
  1. --
  2. :api: put-data-frame-analytics
  3. :request: PutDataFrameAnalyticsRequest
  4. :response: PutDataFrameAnalyticsResponse
  5. --
  6. [role="xpack"]
  7. [id="{upid}-{api}"]
  8. === Create {dfanalytics-jobs} API
  9. beta::[]
  10. Creates a new {dfanalytics-job}.
  11. The API accepts a +{request}+ object as a request and returns a +{response}+.
  12. [id="{upid}-{api}-request"]
  13. ==== Request
  14. A +{request}+ requires the following argument:
  15. ["source","java",subs="attributes,callouts,macros"]
  16. --------------------------------------------------
  17. include-tagged::{doc-tests-file}[{api}-request]
  18. --------------------------------------------------
  19. <1> The configuration of the {dfanalytics-job} to create
  20. [id="{upid}-{api}-config"]
  21. ==== {dfanalytics-cap} configuration
  22. The `DataFrameAnalyticsConfig` object contains all the details about the {dfanalytics-job}
  23. configuration and contains the following arguments:
  24. ["source","java",subs="attributes,callouts,macros"]
  25. --------------------------------------------------
  26. include-tagged::{doc-tests-file}[{api}-config]
  27. --------------------------------------------------
  28. <1> The {dfanalytics-job} ID
  29. <2> The source index and query from which to gather data
  30. <3> The destination index
  31. <4> The analysis to be performed
  32. <5> The fields to be included in / excluded from the analysis
  33. <6> The memory limit for the model created as part of the analysis process
  34. <7> Optionally, a human-readable description
  35. <8> The maximum number of threads to be used by the analysis. Defaults to 1.
  36. [id="{upid}-{api}-query-config"]
  37. ==== SourceConfig
  38. The index and the query from which to collect data.
  39. ["source","java",subs="attributes,callouts,macros"]
  40. --------------------------------------------------
  41. include-tagged::{doc-tests-file}[{api}-source-config]
  42. --------------------------------------------------
  43. <1> Constructing a new DataFrameAnalyticsSource
  44. <2> The source index
  45. <3> The query from which to gather the data. If query is not set, a `match_all` query is used by default.
  46. <4> Runtime mappings that will be added to the destination index mapping.
  47. <5> Source filtering to select which fields will exist in the destination index.
  48. ===== QueryConfig
  49. The query with which to select data from the source.
  50. ["source","java",subs="attributes,callouts,macros"]
  51. --------------------------------------------------
  52. include-tagged::{doc-tests-file}[{api}-query-config]
  53. --------------------------------------------------
  54. ==== DestinationConfig
  55. The index to which data should be written by the {dfanalytics-job}.
  56. ["source","java",subs="attributes,callouts,macros"]
  57. --------------------------------------------------
  58. include-tagged::{doc-tests-file}[{api}-dest-config]
  59. --------------------------------------------------
  60. <1> Constructing a new DataFrameAnalyticsDest
  61. <2> The destination index
  62. ==== Analysis
  63. The analysis to be performed.
  64. Currently, the supported analyses include: +OutlierDetection+, +Classification+, +Regression+.
  65. ===== Outlier detection
  66. +OutlierDetection+ analysis can be created in one of two ways:
  67. ["source","java",subs="attributes,callouts,macros"]
  68. --------------------------------------------------
  69. include-tagged::{doc-tests-file}[{api}-outlier-detection-default]
  70. --------------------------------------------------
  71. <1> Constructing a new OutlierDetection object with default strategy to determine outliers
  72. or
  73. ["source","java",subs="attributes,callouts,macros"]
  74. --------------------------------------------------
  75. include-tagged::{doc-tests-file}[{api}-outlier-detection-customized]
  76. --------------------------------------------------
  77. <1> Constructing a new OutlierDetection object
  78. <2> The method used to perform the analysis
  79. <3> Number of neighbors taken into account during analysis
  80. <4> The min `outlier_score` required to compute feature influence
  81. <5> Whether to compute feature influence
  82. <6> The proportion of the data set that is assumed to be outlying prior to outlier detection
  83. <7> Whether to apply standardization to feature values
  84. ===== Classification
  85. +Classification+ analysis requires to set which is the +dependent_variable+ and
  86. has a number of other optional parameters:
  87. ["source","java",subs="attributes,callouts,macros"]
  88. --------------------------------------------------
  89. include-tagged::{doc-tests-file}[{api}-classification]
  90. --------------------------------------------------
  91. <1> Constructing a new Classification builder object with the required dependent variable
  92. <2> The lambda regularization parameter. A non-negative double.
  93. <3> The gamma regularization parameter. A non-negative double.
  94. <4> The applied shrinkage. A double in [0.001, 1].
  95. <5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].
  96. <6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
  97. <7> If set, feature importance for the top most important features will be computed.
  98. <8> The name of the prediction field in the results object.
  99. <9> The percentage of training-eligible rows to be used in training. Defaults to 100%.
  100. <10> The seed to be used by the random generator that picks which rows are used in training.
  101. <11> The optimization objective to target when assigning class labels. Defaults to maximize_minimum_recall.
  102. <12> The number of top classes (or -1 which denotes all classes) to be reported in the results. Defaults to 2.
  103. <13> Custom feature processors that will create new features for analysis from the included document
  104. fields. Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all features.
  105. <14> The alpha regularization parameter. A non-negative double.
  106. <15> The growth rate of the shrinkage parameter. A double in [0.5, 2.0].
  107. <16> The soft tree depth limit. A non-negative double.
  108. <17> The soft tree depth tolerance. Controls how much the soft tree depth limit is respected. A double greater than or equal to 0.01.
  109. <18> The amount by which to downsample the data for stochastic gradient estimates. A double in (0, 1.0].
  110. <19> The maximum number of optimisation rounds we use for hyperparameter optimisation per parameter. An integer in [0, 20].
  111. <20> Whether to enable early stopping to finish training process if it is not finding better models.
  112. ===== Regression
  113. +Regression+ analysis requires to set which is the +dependent_variable+ and
  114. has a number of other optional parameters:
  115. ["source","java",subs="attributes,callouts,macros"]
  116. --------------------------------------------------
  117. include-tagged::{doc-tests-file}[{api}-regression]
  118. --------------------------------------------------
  119. <1> Constructing a new Regression builder object with the required dependent variable
  120. <2> The lambda regularization parameter. A non-negative double.
  121. <3> The gamma regularization parameter. A non-negative double.
  122. <4> The applied shrinkage. A double in [0.001, 1].
  123. <5> The maximum number of trees the forest is allowed to contain. An integer in [1, 2000].
  124. <6> The fraction of features which will be used when selecting a random bag for each candidate split. A double in (0, 1].
  125. <7> If set, feature importance for the top most important features will be computed.
  126. <8> The name of the prediction field in the results object.
  127. <9> The percentage of training-eligible rows to be used in training. Defaults to 100%.
  128. <10> The seed to be used by the random generator that picks which rows are used in training.
  129. <11> The loss function used for regression. Defaults to `mse`.
  130. <12> An optional parameter to the loss function.
  131. <13> Custom feature processors that will create new features for analysis from the included document
  132. fields. Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs for all features.
  133. <14> The alpha regularization parameter. A non-negative double.
  134. <15> The growth rate of the shrinkage parameter. A double in [0.5, 2.0].
  135. <16> The soft tree depth limit. A non-negative double.
  136. <17> The soft tree depth tolerance. Controls how much the soft tree depth limit is respected. A double greater than or equal to 0.01.
  137. <18> The amount by which to downsample the data for stochastic gradient estimates. A double in (0, 1.0].
  138. <19> The maximum number of optimisation rounds we use for hyperparameter optimisation per parameter. An integer in [0, 20].
  139. <20> Whether to enable early stopping to finish training process if it is not finding better models.
  140. ==== Analyzed fields
  141. FetchContext object containing fields to be included in / excluded from the analysis
  142. ["source","java",subs="attributes,callouts,macros"]
  143. --------------------------------------------------
  144. include-tagged::{doc-tests-file}[{api}-analyzed-fields]
  145. --------------------------------------------------
  146. include::../execution.asciidoc[]
  147. [id="{upid}-{api}-response"]
  148. ==== Response
  149. The returned +{response}+ contains the newly created {dfanalytics-job}.
  150. ["source","java",subs="attributes,callouts,macros"]
  151. --------------------------------------------------
  152. include-tagged::{doc-tests-file}[{api}-response]
  153. --------------------------------------------------