evaluate-dfanalytics.asciidoc 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[evaluate-dfanalytics]]
  4. === Evaluate {dfanalytics} API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Evaluate {dfanalytics}</titleabbrev>
  8. ++++
  9. Evaluates the {dfanalytics} for an annotated index.
  10. experimental[]
  11. [[ml-evaluate-dfanalytics-request]]
  12. ==== {api-request-title}
  13. `POST _ml/data_frame/_evaluate`
  14. [[ml-evaluate-dfanalytics-prereq]]
  15. ==== {api-prereq-title}
  16. If the {es} {security-features} are enabled, you must have the following
  17. privileges:
  18. * cluster: `monitor_ml`
  19. For more information, see <<security-privileges>> and <<built-in-roles>>.
  20. [[ml-evaluate-dfanalytics-desc]]
  21. ==== {api-description-title}
  22. The API packages together commonly used evaluation metrics for various types of
  23. machine learning features. This has been designed for use on indexes created by
  24. {dfanalytics}. Evaluation requires both a ground truth field and an analytics
  25. result field to be present.
  26. [[ml-evaluate-dfanalytics-request-body]]
  27. ==== {api-request-body-title}
  28. `evaluation`::
  29. (Required, object) Defines the type of evaluation you want to perform.
  30. See <<ml-evaluate-dfanalytics-resources>>.
  31. +
  32. --
  33. Available evaluation types:
  34. * `binary_soft_classification`
  35. * `regression`
  36. * `classification`
  37. --
  38. `index`::
  39. (Required, object) Defines the `index` in which the evaluation will be
  40. performed.
  41. `query`::
  42. (Optional, object) A query clause that retrieves a subset of data from the
  43. source index. See <<query-dsl>>.
  44. [[ml-evaluate-dfanalytics-resources]]
  45. ==== {dfanalytics-cap} evaluation resources
  46. [[binary-sc-resources]]
  47. ===== Binary soft classification evaluation objects
  48. Binary soft classification evaluates the results of an analysis which outputs
  49. the probability that each document belongs to a certain class. For example, in
  50. the context of {oldetection}, the analysis outputs the probability whether each
  51. document is an outlier.
  52. `actual_field`::
  53. (Required, string) The field of the `index` which contains the `ground truth`.
  54. The data type of this field can be boolean or integer. If the data type is
  55. integer, the value has to be either `0` (false) or `1` (true).
  56. `predicted_probability_field`::
  57. (Required, string) The field of the `index` that defines the probability of
  58. whether the item belongs to the class in question or not. It's the field that
  59. contains the results of the analysis.
  60. `metrics`::
  61. (Optional, object) Specifies the metrics that are used for the evaluation.
  62. Available metrics:
  63. `auc_roc`:::
  64. (Optional, object) The AUC ROC (area under the curve of the receiver
  65. operating characteristic) score and optionally the curve. Default value is
  66. {"includes_curve": false}.
  67. `confusion_matrix`:::
  68. (Optional, object) Set the different thresholds of the {olscore} at where
  69. the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
  70. negative, `fn` - false negative) are calculated. Default value is
  71. {"at": [0.25, 0.50, 0.75]}.
  72. `precision`:::
  73. (Optional, object) Set the different thresholds of the {olscore} at where
  74. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  75. `recall`:::
  76. (Optional, object) Set the different thresholds of the {olscore} at where
  77. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  78. [[regression-evaluation-resources]]
  79. ===== {regression-cap} evaluation objects
  80. {regression-cap} evaluation evaluates the results of a {regression} analysis
  81. which outputs a prediction of values.
  82. `actual_field`::
  83. (Required, string) The field of the `index` which contains the `ground truth`.
  84. The data type of this field must be numerical.
  85. `predicted_field`::
  86. (Required, string) The field in the `index` that contains the predicted value,
  87. in other words the results of the {regression} analysis.
  88. `metrics`::
  89. (Optional, object) Specifies the metrics that are used for the evaluation.
  90. Available metrics:
  91. `mse`:::
  92. (Optional, object) Average squared difference between the predicted values and the actual (`ground truth`) value.
  93. For more information, read https://en.wikipedia.org/wiki/Mean_squared_error[this wiki article].
  94. `msle`:::
  95. (Optional, object) Average squared difference between the logarithm of the predicted values and the logarithm of the actual
  96. (`ground truth`) value.
  97. `huber`:::
  98. (Optional, object) Pseudo Huber loss function.
  99. For more information, read https://en.wikipedia.org/wiki/Huber_loss#Pseudo-Huber_loss_function[this wiki article].
  100. `r_squared`:::
  101. (Optional, object) Proportion of the variance in the dependent variable that is predictable from the independent variables.
  102. For more information, read https://en.wikipedia.org/wiki/Coefficient_of_determination[this wiki article].
  103. [[classification-evaluation-resources]]
  104. ==== {classification-cap} evaluation objects
  105. {classification-cap} evaluation evaluates the results of a {classanalysis} which
  106. outputs a prediction that identifies to which of the classes each document
  107. belongs.
  108. `actual_field`::
  109. (Required, string) The field of the `index` which contains the `ground truth`.
  110. The data type of this field must be categorical.
  111. `predicted_field`::
  112. (Required, string) The field in the `index` that contains the predicted value,
  113. in other words the results of the {classanalysis}.
  114. `metrics`::
  115. (Optional, object) Specifies the metrics that are used for the evaluation.
  116. Available metrics:
  117. `accuracy`:::
  118. (Optional, object) Accuracy of predictions (per-class and overall).
  119. `multiclass_confusion_matrix`:::
  120. (Optional, object) Multiclass confusion matrix.
  121. `precision`:::
  122. (Optional, object) Precision of predictions (per-class and average).
  123. `recall`:::
  124. (Optional, object) Recall of predictions (per-class and average).
  125. ////
  126. [[ml-evaluate-dfanalytics-results]]
  127. ==== {api-response-body-title}
  128. `binary_soft_classification`::
  129. (object) If you chose to do binary soft classification, the API returns the
  130. following evaluation metrics:
  131. `auc_roc`::: TBD
  132. `confusion_matrix`::: TBD
  133. `precision`::: TBD
  134. `recall`::: TBD
  135. ////
  136. [[ml-evaluate-dfanalytics-example]]
  137. ==== {api-examples-title}
  138. [[ml-evaluate-binary-soft-class-example]]
  139. ===== Binary soft classification
  140. [source,console]
  141. --------------------------------------------------
  142. POST _ml/data_frame/_evaluate
  143. {
  144. "index": "my_analytics_dest_index",
  145. "evaluation": {
  146. "binary_soft_classification": {
  147. "actual_field": "is_outlier",
  148. "predicted_probability_field": "ml.outlier_score"
  149. }
  150. }
  151. }
  152. --------------------------------------------------
  153. // TEST[skip:TBD]
  154. The API returns the following results:
  155. [source,console-result]
  156. ----
  157. {
  158. "binary_soft_classification": {
  159. "auc_roc": {
  160. "score": 0.92584757746414444
  161. },
  162. "confusion_matrix": {
  163. "0.25": {
  164. "tp": 5,
  165. "fp": 9,
  166. "tn": 204,
  167. "fn": 5
  168. },
  169. "0.5": {
  170. "tp": 1,
  171. "fp": 5,
  172. "tn": 208,
  173. "fn": 9
  174. },
  175. "0.75": {
  176. "tp": 0,
  177. "fp": 4,
  178. "tn": 209,
  179. "fn": 10
  180. }
  181. },
  182. "precision": {
  183. "0.25": 0.35714285714285715,
  184. "0.5": 0.16666666666666666,
  185. "0.75": 0
  186. },
  187. "recall": {
  188. "0.25": 0.5,
  189. "0.5": 0.1,
  190. "0.75": 0
  191. }
  192. }
  193. }
  194. ----
  195. [[ml-evaluate-regression-example]]
  196. ===== {regression-cap}
  197. [source,console]
  198. --------------------------------------------------
  199. POST _ml/data_frame/_evaluate
  200. {
  201. "index": "house_price_predictions", <1>
  202. "query": {
  203. "bool": {
  204. "filter": [
  205. { "term": { "ml.is_training": false } } <2>
  206. ]
  207. }
  208. },
  209. "evaluation": {
  210. "regression": {
  211. "actual_field": "price", <3>
  212. "predicted_field": "ml.price_prediction", <4>
  213. "metrics": {
  214. "r_squared": {},
  215. "mse": {}
  216. }
  217. }
  218. }
  219. }
  220. --------------------------------------------------
  221. // TEST[skip:TBD]
  222. <1> The output destination index from a {dfanalytics} {reganalysis}.
  223. <2> In this example, a test/train split (`training_percent`) was defined for the
  224. {reganalysis}. This query limits evaluation to be performed on the test split
  225. only.
  226. <3> The ground truth value for the actual house price. This is required in order
  227. to evaluate results.
  228. <4> The predicted value for house price calculated by the {reganalysis}.
  229. The following example calculates the training error:
  230. [source,console]
  231. --------------------------------------------------
  232. POST _ml/data_frame/_evaluate
  233. {
  234. "index": "student_performance_mathematics_reg",
  235. "query": {
  236. "term": {
  237. "ml.is_training": {
  238. "value": true <1>
  239. }
  240. }
  241. },
  242. "evaluation": {
  243. "regression": {
  244. "actual_field": "G3", <2>
  245. "predicted_field": "ml.G3_prediction", <3>
  246. "metrics": {
  247. "r_squared": {},
  248. "mse": {}
  249. }
  250. }
  251. }
  252. }
  253. --------------------------------------------------
  254. // TEST[skip:TBD]
  255. <1> In this example, a test/train split (`training_percent`) was defined for the
  256. {reganalysis}. This query limits evaluation to be performed on the train split
  257. only. It means that a training error will be calculated.
  258. <2> The field that contains the ground truth value for the actual student
  259. performance. This is required in order to evaluate results.
  260. <3> The field that contains the predicted value for student performance
  261. calculated by the {reganalysis}.
  262. The next example calculates the testing error. The only difference compared with
  263. the previous example is that `ml.is_training` is set to `false` this time, so
  264. the query excludes the train split from the evaluation.
  265. [source,console]
  266. --------------------------------------------------
  267. POST _ml/data_frame/_evaluate
  268. {
  269. "index": "student_performance_mathematics_reg",
  270. "query": {
  271. "term": {
  272. "ml.is_training": {
  273. "value": false <1>
  274. }
  275. }
  276. },
  277. "evaluation": {
  278. "regression": {
  279. "actual_field": "G3", <2>
  280. "predicted_field": "ml.G3_prediction", <3>
  281. "metrics": {
  282. "r_squared": {},
  283. "mse": {}
  284. }
  285. }
  286. }
  287. }
  288. --------------------------------------------------
  289. // TEST[skip:TBD]
  290. <1> In this example, a test/train split (`training_percent`) was defined for the
  291. {reganalysis}. This query limits evaluation to be performed on the test split
  292. only. It means that a testing error will be calculated.
  293. <2> The field that contains the ground truth value for the actual student
  294. performance. This is required in order to evaluate results.
  295. <3> The field that contains the predicted value for student performance
  296. calculated by the {reganalysis}.
  297. [[ml-evaluate-classification-example]]
  298. ===== {classification-cap}
  299. [source,console]
  300. --------------------------------------------------
  301. POST _ml/data_frame/_evaluate
  302. {
  303. "index": "animal_classification",
  304. "evaluation": {
  305. "classification": { <1>
  306. "actual_field": "animal_class", <2>
  307. "predicted_field": "ml.animal_class_prediction", <3>
  308. "metrics": {
  309. "multiclass_confusion_matrix" : {} <4>
  310. }
  311. }
  312. }
  313. }
  314. --------------------------------------------------
  315. // TEST[skip:TBD]
  316. <1> The evaluation type.
  317. <2> The field that contains the ground truth value for the actual animal
  318. classification. This is required in order to evaluate results.
  319. <3> The field that contains the predicted value for animal classification by
  320. the {classanalysis}.
  321. <4> Specifies the metric for the evaluation.
  322. The API returns the following result:
  323. [source,console-result]
  324. --------------------------------------------------
  325. {
  326. "classification" : {
  327. "multiclass_confusion_matrix" : {
  328. "confusion_matrix" : [
  329. {
  330. "actual_class" : "cat", <1>
  331. "actual_class_doc_count" : 12, <2>
  332. "predicted_classes" : [ <3>
  333. {
  334. "predicted_class" : "cat",
  335. "count" : 12 <4>
  336. },
  337. {
  338. "predicted_class" : "dog",
  339. "count" : 0 <5>
  340. }
  341. ],
  342. "other_predicted_class_doc_count" : 0 <6>
  343. },
  344. {
  345. "actual_class" : "dog",
  346. "actual_class_doc_count" : 11,
  347. "predicted_classes" : [
  348. {
  349. "predicted_class" : "dog",
  350. "count" : 7
  351. },
  352. {
  353. "predicted_class" : "cat",
  354. "count" : 4
  355. }
  356. ],
  357. "other_predicted_class_doc_count" : 0
  358. }
  359. ],
  360. "other_actual_class_count" : 0
  361. }
  362. }
  363. }
  364. --------------------------------------------------
  365. <1> The name of the actual class that the analysis tried to predict.
  366. <2> The number of documents in the index that belong to the `actual_class`.
  367. <3> This object contains the list of the predicted classes and the number of
  368. predictions associated with the class.
  369. <4> The number of cats in the dataset that are correctly identified as cats.
  370. <5> The number of cats in the dataset that are incorrectly classified as dogs.
  371. <6> The number of documents that are classified as a class that is not listed as
  372. a `predicted_class`.