evaluate-dfanalytics.asciidoc 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[evaluate-dfanalytics]]
  4. = Evaluate {dfanalytics} API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Evaluate {dfanalytics}</titleabbrev>
  8. ++++
  9. Evaluates the {dfanalytics} for an annotated index.
  10. experimental[]
  11. [[ml-evaluate-dfanalytics-request]]
  12. == {api-request-title}
  13. `POST _ml/data_frame/_evaluate`
  14. [[ml-evaluate-dfanalytics-prereq]]
  15. == {api-prereq-title}
  16. If the {es} {security-features} are enabled, you must have the following
  17. privileges:
  18. * cluster: `monitor_ml`
  19. For more information, see <<security-privileges>> and {ml-docs-setup-privileges}.
  20. [[ml-evaluate-dfanalytics-desc]]
  21. == {api-description-title}
  22. The API packages together commonly used evaluation metrics for various types of
  23. machine learning features. This has been designed for use on indexes created by
  24. {dfanalytics}. Evaluation requires both a ground truth field and an analytics
  25. result field to be present.
  26. [[ml-evaluate-dfanalytics-request-body]]
  27. == {api-request-body-title}
  28. `evaluation`::
  29. (Required, object) Defines the type of evaluation you want to perform.
  30. See <<ml-evaluate-dfanalytics-resources>>.
  31. +
  32. --
  33. Available evaluation types:
  34. * `outlier_detection`
  35. * `regression`
  36. * `classification`
  37. --
  38. `index`::
  39. (Required, object) Defines the `index` in which the evaluation will be
  40. performed.
  41. `query`::
  42. (Optional, object) A query clause that retrieves a subset of data from the
  43. source index. See <<query-dsl>>.
  44. [[ml-evaluate-dfanalytics-resources]]
  45. == {dfanalytics-cap} evaluation resources
  46. [[oldetection-resources]]
  47. === {oldetection-cap} evaluation objects
  48. {oldetection-cap} evaluates the results of an {oldetection} analysis which outputs
  49. the probability that each document is an outlier.
  50. `actual_field`::
  51. (Required, string) The field of the `index` which contains the `ground truth`.
  52. The data type of this field can be boolean or integer. If the data type is
  53. integer, the value has to be either `0` (false) or `1` (true).
  54. `predicted_probability_field`::
  55. (Required, string) The field of the `index` that defines the probability of
  56. whether the item belongs to the class in question or not. It's the field that
  57. contains the results of the analysis.
  58. `metrics`::
  59. (Optional, object) Specifies the metrics that are used for the evaluation.
  60. Available metrics:
  61. `auc_roc`:::
  62. (Optional, object) The AUC ROC (area under the curve of the receiver
  63. operating characteristic) score and optionally the curve. Default value is
  64. {"include_curve": false}.
  65. `confusion_matrix`:::
  66. (Optional, object) Set the different thresholds of the {olscore} at where
  67. the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
  68. negative, `fn` - false negative) are calculated. Default value is
  69. {"at": [0.25, 0.50, 0.75]}.
  70. `precision`:::
  71. (Optional, object) Set the different thresholds of the {olscore} at where
  72. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  73. `recall`:::
  74. (Optional, object) Set the different thresholds of the {olscore} at where
  75. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  76. [[regression-evaluation-resources]]
  77. === {regression-cap} evaluation objects
  78. {regression-cap} evaluation evaluates the results of a {regression} analysis
  79. which outputs a prediction of values.
  80. `actual_field`::
  81. (Required, string) The field of the `index` which contains the `ground truth`.
  82. The data type of this field must be numerical.
  83. `predicted_field`::
  84. (Required, string) The field in the `index` that contains the predicted value,
  85. in other words the results of the {regression} analysis.
  86. `metrics`::
  87. (Optional, object) Specifies the metrics that are used for the evaluation.
  88. Available metrics:
  89. `mse`:::
  90. (Optional, object) Average squared difference between the predicted values and the actual (`ground truth`) value.
  91. For more information, read {wikipedia}/Mean_squared_error[this wiki article].
  92. `msle`:::
  93. (Optional, object) Average squared difference between the logarithm of the predicted values and the logarithm of the actual
  94. (`ground truth`) value.
  95. `huber`:::
  96. (Optional, object) Pseudo Huber loss function.
  97. For more information, read {wikipedia}/Huber_loss#Pseudo-Huber_loss_function[this wiki article].
  98. `r_squared`:::
  99. (Optional, object) Proportion of the variance in the dependent variable that is predictable from the independent variables.
  100. For more information, read {wikipedia}/Coefficient_of_determination[this wiki article].
  101. [[classification-evaluation-resources]]
  102. == {classification-cap} evaluation objects
  103. {classification-cap} evaluation evaluates the results of a {classanalysis} which
  104. outputs a prediction that identifies to which of the classes each document
  105. belongs.
  106. `actual_field`::
  107. (Required, string) The field of the `index` which contains the `ground truth`.
  108. The data type of this field must be categorical.
  109. `predicted_field`::
  110. (Optional, string) The field in the `index` which contains the predicted value,
  111. in other words the results of the {classanalysis}.
  112. `top_classes_field`::
  113. (Optional, string) The field of the `index` which is an array of documents
  114. of the form `{ "class_name": XXX, "class_probability": YYY }`.
  115. This field must be defined as `nested` in the mappings.
  116. `metrics`::
  117. (Optional, object) Specifies the metrics that are used for the evaluation.
  118. Available metrics:
  119. `accuracy`:::
  120. (Optional, object) Accuracy of predictions (per-class and overall).
  121. `auc_roc`:::
  122. (Optional, object) The AUC ROC (area under the curve of the receiver
  123. operating characteristic) score and optionally the curve.
  124. It is calculated for a specific class (provided as "class_name")
  125. treated as positive.
  126. `class_name`::::
  127. (Required, string) Name of the only class that will be treated as
  128. positive during AUC ROC calculation. Other classes will be treated as
  129. negative ("one-vs-all" strategy). Documents which do not have `class_name`
  130. in the list of their top classes will not be taken into account for evaluation.
  131. The number of documents taken into account is returned in the evaluation result
  132. (`auc_roc.doc_count` field).
  133. `include_curve`::::
  134. (Optional, boolean) Whether or not the curve should be returned in
  135. addition to the score. Default value is false.
  136. `multiclass_confusion_matrix`:::
  137. (Optional, object) Multiclass confusion matrix.
  138. `precision`:::
  139. (Optional, object) Precision of predictions (per-class and average).
  140. `recall`:::
  141. (Optional, object) Recall of predictions (per-class and average).
  142. ////
  143. [[ml-evaluate-dfanalytics-results]]
  144. == {api-response-body-title}
  145. `outlier_detection`::
  146. (object) If you chose to do outlier detection, the API returns the
  147. following evaluation metrics:
  148. `auc_roc`::: TBD
  149. `confusion_matrix`::: TBD
  150. `precision`::: TBD
  151. `recall`::: TBD
  152. ////
  153. [[ml-evaluate-dfanalytics-example]]
  154. == {api-examples-title}
  155. [[ml-evaluate-oldetection-example]]
  156. === {oldetection-cap}
  157. [source,console]
  158. --------------------------------------------------
  159. POST _ml/data_frame/_evaluate
  160. {
  161. "index": "my_analytics_dest_index",
  162. "evaluation": {
  163. "outlier_detection": {
  164. "actual_field": "is_outlier",
  165. "predicted_probability_field": "ml.outlier_score"
  166. }
  167. }
  168. }
  169. --------------------------------------------------
  170. // TEST[skip:TBD]
  171. The API returns the following results:
  172. [source,console-result]
  173. ----
  174. {
  175. "outlier_detection": {
  176. "auc_roc": {
  177. "score": 0.92584757746414444
  178. },
  179. "confusion_matrix": {
  180. "0.25": {
  181. "tp": 5,
  182. "fp": 9,
  183. "tn": 204,
  184. "fn": 5
  185. },
  186. "0.5": {
  187. "tp": 1,
  188. "fp": 5,
  189. "tn": 208,
  190. "fn": 9
  191. },
  192. "0.75": {
  193. "tp": 0,
  194. "fp": 4,
  195. "tn": 209,
  196. "fn": 10
  197. }
  198. },
  199. "precision": {
  200. "0.25": 0.35714285714285715,
  201. "0.5": 0.16666666666666666,
  202. "0.75": 0
  203. },
  204. "recall": {
  205. "0.25": 0.5,
  206. "0.5": 0.1,
  207. "0.75": 0
  208. }
  209. }
  210. }
  211. ----
  212. [[ml-evaluate-regression-example]]
  213. === {regression-cap}
  214. [source,console]
  215. --------------------------------------------------
  216. POST _ml/data_frame/_evaluate
  217. {
  218. "index": "house_price_predictions", <1>
  219. "query": {
  220. "bool": {
  221. "filter": [
  222. { "term": { "ml.is_training": false } } <2>
  223. ]
  224. }
  225. },
  226. "evaluation": {
  227. "regression": {
  228. "actual_field": "price", <3>
  229. "predicted_field": "ml.price_prediction", <4>
  230. "metrics": {
  231. "r_squared": {},
  232. "mse": {}
  233. }
  234. }
  235. }
  236. }
  237. --------------------------------------------------
  238. // TEST[skip:TBD]
  239. <1> The output destination index from a {dfanalytics} {reganalysis}.
  240. <2> In this example, a test/train split (`training_percent`) was defined for the
  241. {reganalysis}. This query limits evaluation to be performed on the test split
  242. only.
  243. <3> The ground truth value for the actual house price. This is required in order
  244. to evaluate results.
  245. <4> The predicted value for house price calculated by the {reganalysis}.
  246. The following example calculates the training error:
  247. [source,console]
  248. --------------------------------------------------
  249. POST _ml/data_frame/_evaluate
  250. {
  251. "index": "student_performance_mathematics_reg",
  252. "query": {
  253. "term": {
  254. "ml.is_training": {
  255. "value": true <1>
  256. }
  257. }
  258. },
  259. "evaluation": {
  260. "regression": {
  261. "actual_field": "G3", <2>
  262. "predicted_field": "ml.G3_prediction", <3>
  263. "metrics": {
  264. "r_squared": {},
  265. "mse": {}
  266. }
  267. }
  268. }
  269. }
  270. --------------------------------------------------
  271. // TEST[skip:TBD]
  272. <1> In this example, a test/train split (`training_percent`) was defined for the
  273. {reganalysis}. This query limits evaluation to be performed on the train split
  274. only. It means that a training error will be calculated.
  275. <2> The field that contains the ground truth value for the actual student
  276. performance. This is required in order to evaluate results.
  277. <3> The field that contains the predicted value for student performance
  278. calculated by the {reganalysis}.
  279. The next example calculates the testing error. The only difference compared with
  280. the previous example is that `ml.is_training` is set to `false` this time, so
  281. the query excludes the train split from the evaluation.
  282. [source,console]
  283. --------------------------------------------------
  284. POST _ml/data_frame/_evaluate
  285. {
  286. "index": "student_performance_mathematics_reg",
  287. "query": {
  288. "term": {
  289. "ml.is_training": {
  290. "value": false <1>
  291. }
  292. }
  293. },
  294. "evaluation": {
  295. "regression": {
  296. "actual_field": "G3", <2>
  297. "predicted_field": "ml.G3_prediction", <3>
  298. "metrics": {
  299. "r_squared": {},
  300. "mse": {}
  301. }
  302. }
  303. }
  304. }
  305. --------------------------------------------------
  306. // TEST[skip:TBD]
  307. <1> In this example, a test/train split (`training_percent`) was defined for the
  308. {reganalysis}. This query limits evaluation to be performed on the test split
  309. only. It means that a testing error will be calculated.
  310. <2> The field that contains the ground truth value for the actual student
  311. performance. This is required in order to evaluate results.
  312. <3> The field that contains the predicted value for student performance
  313. calculated by the {reganalysis}.
  314. [[ml-evaluate-classification-example]]
  315. === {classification-cap}
  316. [source,console]
  317. --------------------------------------------------
  318. POST _ml/data_frame/_evaluate
  319. {
  320. "index": "animal_classification",
  321. "evaluation": {
  322. "classification": { <1>
  323. "actual_field": "animal_class", <2>
  324. "predicted_field": "ml.animal_class_prediction", <3>
  325. "metrics": {
  326. "multiclass_confusion_matrix" : {} <4>
  327. }
  328. }
  329. }
  330. }
  331. --------------------------------------------------
  332. // TEST[skip:TBD]
  333. <1> The evaluation type.
  334. <2> The field that contains the ground truth value for the actual animal
  335. classification. This is required in order to evaluate results.
  336. <3> The field that contains the predicted value for animal classification by
  337. the {classanalysis}.
  338. <4> Specifies the metric for the evaluation.
  339. The API returns the following result:
  340. [source,console-result]
  341. --------------------------------------------------
  342. {
  343. "classification" : {
  344. "multiclass_confusion_matrix" : {
  345. "confusion_matrix" : [
  346. {
  347. "actual_class" : "cat", <1>
  348. "actual_class_doc_count" : 12, <2>
  349. "predicted_classes" : [ <3>
  350. {
  351. "predicted_class" : "cat",
  352. "count" : 12 <4>
  353. },
  354. {
  355. "predicted_class" : "dog",
  356. "count" : 0 <5>
  357. }
  358. ],
  359. "other_predicted_class_doc_count" : 0 <6>
  360. },
  361. {
  362. "actual_class" : "dog",
  363. "actual_class_doc_count" : 11,
  364. "predicted_classes" : [
  365. {
  366. "predicted_class" : "dog",
  367. "count" : 7
  368. },
  369. {
  370. "predicted_class" : "cat",
  371. "count" : 4
  372. }
  373. ],
  374. "other_predicted_class_doc_count" : 0
  375. }
  376. ],
  377. "other_actual_class_count" : 0
  378. }
  379. }
  380. }
  381. --------------------------------------------------
  382. <1> The name of the actual class that the analysis tried to predict.
  383. <2> The number of documents in the index that belong to the `actual_class`.
  384. <3> This object contains the list of the predicted classes and the number of
  385. predictions associated with the class.
  386. <4> The number of cats in the dataset that are correctly identified as cats.
  387. <5> The number of cats in the dataset that are incorrectly classified as dogs.
  388. <6> The number of documents that are classified as a class that is not listed as
  389. a `predicted_class`.