evaluate-dfanalytics.asciidoc 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[evaluate-dfanalytics]]
  4. = Evaluate {dfanalytics} API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Evaluate {dfanalytics}</titleabbrev>
  8. ++++
  9. Evaluates the {dfanalytics} for an annotated index.
  10. experimental[]
  11. [[ml-evaluate-dfanalytics-request]]
  12. == {api-request-title}
  13. `POST _ml/data_frame/_evaluate`
  14. [[ml-evaluate-dfanalytics-prereq]]
  15. == {api-prereq-title}
  16. If the {es} {security-features} are enabled, you must have the following
  17. privileges:
  18. * cluster: `monitor_ml`
  19. For more information, see <<security-privileges>> and
  20. {ml-docs-setup-privileges}.
  21. [[ml-evaluate-dfanalytics-desc]]
  22. == {api-description-title}
  23. The API packages together commonly used evaluation metrics for various types of
  24. machine learning features. This has been designed for use on indexes created by
  25. {dfanalytics}. Evaluation requires both a ground truth field and an analytics
  26. result field to be present.
  27. [[ml-evaluate-dfanalytics-request-body]]
  28. == {api-request-body-title}
  29. `evaluation`::
  30. (Required, object) Defines the type of evaluation you want to perform.
  31. See <<ml-evaluate-dfanalytics-resources>>.
  32. +
  33. --
  34. Available evaluation types:
  35. * `outlier_detection`
  36. * `regression`
  37. * `classification`
  38. --
  39. `index`::
  40. (Required, object) Defines the `index` in which the evaluation will be
  41. performed.
  42. `query`::
  43. (Optional, object) A query clause that retrieves a subset of data from the
  44. source index. See <<query-dsl>>.
  45. [[ml-evaluate-dfanalytics-resources]]
  46. == {dfanalytics-cap} evaluation resources
  47. [[oldetection-resources]]
  48. === {oldetection-cap} evaluation objects
  49. {oldetection-cap} evaluates the results of an {oldetection} analysis which
  50. outputs the probability that each document is an outlier.
  51. `actual_field`::
  52. (Required, string) The field of the `index` which contains the `ground truth`.
  53. The data type of this field can be boolean or integer. If the data type is
  54. integer, the value has to be either `0` (false) or `1` (true).
  55. `predicted_probability_field`::
  56. (Required, string) The field of the `index` that defines the probability of
  57. whether the item belongs to the class in question or not. It's the field that
  58. contains the results of the analysis.
  59. `metrics`::
  60. (Optional, object) Specifies the metrics that are used for the evaluation.
  61. Available metrics:
  62. `auc_roc`:::
  63. (Optional, object) The AUC ROC (area under the curve of the receiver
  64. operating characteristic) score and optionally the curve. Default value is
  65. {"include_curve": false}.
  66. `confusion_matrix`:::
  67. (Optional, object) Set the different thresholds of the {olscore} at where
  68. the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
  69. negative, `fn` - false negative) are calculated. Default value is
  70. {"at": [0.25, 0.50, 0.75]}.
  71. `precision`:::
  72. (Optional, object) Set the different thresholds of the {olscore} at where
  73. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  74. `recall`:::
  75. (Optional, object) Set the different thresholds of the {olscore} at where
  76. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  77. [[regression-evaluation-resources]]
  78. === {regression-cap} evaluation objects
  79. {regression-cap} evaluation evaluates the results of a {regression} analysis
  80. which outputs a prediction of values.
  81. `actual_field`::
  82. (Required, string) The field of the `index` which contains the `ground truth`.
  83. The data type of this field must be numerical.
  84. `predicted_field`::
  85. (Required, string) The field in the `index` that contains the predicted value,
  86. in other words the results of the {regression} analysis.
  87. `metrics`::
  88. (Optional, object) Specifies the metrics that are used for the evaluation. For
  89. more information on `mse`, `msle`, and `huber`, consult
  90. https://github.com/elastic/examples/tree/master/Machine%20Learning/Regression%20Loss%20Functions[the Jupyter notebook on regression loss functions].
  91. Available metrics:
  92. `mse`:::
  93. (Optional, object) Average squared difference between the predicted values
  94. and the actual (`ground truth`) value. For more information, read
  95. {wikipedia}/Mean_squared_error[this wiki article].
  96. `msle`:::
  97. (Optional, object) Average squared difference between the logarithm of the
  98. predicted values and the logarithm of the actual (`ground truth`) value.
  99. `offset`::::
  100. (Optional, double) Defines the transition point at which you switch from
  101. minimizing quadratic error to minimizing quadratic log error. Defaults to
  102. `1`.
  103. `huber`:::
  104. (Optional, object) Pseudo Huber loss function. For more information, read
  105. {wikipedia}/Huber_loss#Pseudo-Huber_loss_function[this wiki article].
  106. `delta`::::
  107. (Optional, double) Approximates 1/2 (prediction - actual)^2^ for values
  108. much less than delta and approximates a straight line with slope delta for
  109. values much larger than delta. Defaults to `1`. Delta needs to be greater
  110. than `0`.
  111. `r_squared`:::
  112. (Optional, object) Proportion of the variance in the dependent variable that
  113. is predictable from the independent variables. For more information, read
  114. {wikipedia}/Coefficient_of_determination[this wiki article].
  115. [[classification-evaluation-resources]]
  116. == {classification-cap} evaluation objects
  117. {classification-cap} evaluation evaluates the results of a {classanalysis} which
  118. outputs a prediction that identifies to which of the classes each document
  119. belongs.
  120. `actual_field`::
  121. (Required, string) The field of the `index` which contains the `ground truth`.
  122. The data type of this field must be categorical.
  123. `predicted_field`::
  124. (Optional, string) The field in the `index` which contains the predicted value,
  125. in other words the results of the {classanalysis}.
  126. `top_classes_field`::
  127. (Optional, string) The field of the `index` which is an array of documents
  128. of the form `{ "class_name": XXX, "class_probability": YYY }`.
  129. This field must be defined as `nested` in the mappings.
  130. `metrics`::
  131. (Optional, object) Specifies the metrics that are used for the evaluation.
  132. Available metrics:
  133. `accuracy`:::
  134. (Optional, object) Accuracy of predictions (per-class and overall).
  135. `auc_roc`:::
  136. (Optional, object) The AUC ROC (area under the curve of the receiver
  137. operating characteristic) score and optionally the curve.
  138. It is calculated for a specific class (provided as "class_name") treated as
  139. positive.
  140. `class_name`::::
  141. (Required, string) Name of the only class that will be treated as
  142. positive during AUC ROC calculation. Other classes will be treated as
  143. negative ("one-vs-all" strategy). All the evaluated documents must have `class_name`
  144. in the list of their top classes.
  145. `include_curve`::::
  146. (Optional, boolean) Whether or not the curve should be returned in
  147. addition to the score. Default value is false.
  148. `multiclass_confusion_matrix`:::
  149. (Optional, object) Multiclass confusion matrix.
  150. `precision`:::
  151. (Optional, object) Precision of predictions (per-class and average).
  152. `recall`:::
  153. (Optional, object) Recall of predictions (per-class and average).
  154. ////
  155. [[ml-evaluate-dfanalytics-results]]
  156. == {api-response-body-title}
  157. `outlier_detection`::
  158. (object) If you chose to do outlier detection, the API returns the
  159. following evaluation metrics:
  160. `auc_roc`::: TBD
  161. `confusion_matrix`::: TBD
  162. `precision`::: TBD
  163. `recall`::: TBD
  164. ////
  165. [[ml-evaluate-dfanalytics-example]]
  166. == {api-examples-title}
  167. [[ml-evaluate-oldetection-example]]
  168. === {oldetection-cap}
  169. [source,console]
  170. --------------------------------------------------
  171. POST _ml/data_frame/_evaluate
  172. {
  173. "index": "my_analytics_dest_index",
  174. "evaluation": {
  175. "outlier_detection": {
  176. "actual_field": "is_outlier",
  177. "predicted_probability_field": "ml.outlier_score"
  178. }
  179. }
  180. }
  181. --------------------------------------------------
  182. // TEST[skip:TBD]
  183. The API returns the following results:
  184. [source,console-result]
  185. ----
  186. {
  187. "outlier_detection": {
  188. "auc_roc": {
  189. "score": 0.92584757746414444
  190. },
  191. "confusion_matrix": {
  192. "0.25": {
  193. "tp": 5,
  194. "fp": 9,
  195. "tn": 204,
  196. "fn": 5
  197. },
  198. "0.5": {
  199. "tp": 1,
  200. "fp": 5,
  201. "tn": 208,
  202. "fn": 9
  203. },
  204. "0.75": {
  205. "tp": 0,
  206. "fp": 4,
  207. "tn": 209,
  208. "fn": 10
  209. }
  210. },
  211. "precision": {
  212. "0.25": 0.35714285714285715,
  213. "0.5": 0.16666666666666666,
  214. "0.75": 0
  215. },
  216. "recall": {
  217. "0.25": 0.5,
  218. "0.5": 0.1,
  219. "0.75": 0
  220. }
  221. }
  222. }
  223. ----
  224. [[ml-evaluate-regression-example]]
  225. === {regression-cap}
  226. [source,console]
  227. --------------------------------------------------
  228. POST _ml/data_frame/_evaluate
  229. {
  230. "index": "house_price_predictions", <1>
  231. "query": {
  232. "bool": {
  233. "filter": [
  234. { "term": { "ml.is_training": false } } <2>
  235. ]
  236. }
  237. },
  238. "evaluation": {
  239. "regression": {
  240. "actual_field": "price", <3>
  241. "predicted_field": "ml.price_prediction", <4>
  242. "metrics": {
  243. "r_squared": {},
  244. "mse": {},
  245. "msle": {"offset": 10},
  246. "huber": {"delta": 1.5}
  247. }
  248. }
  249. }
  250. }
  251. --------------------------------------------------
  252. // TEST[skip:TBD]
  253. <1> The output destination index from a {dfanalytics} {reganalysis}.
  254. <2> In this example, a test/train split (`training_percent`) was defined for the
  255. {reganalysis}. This query limits evaluation to be performed on the test split
  256. only.
  257. <3> The ground truth value for the actual house price. This is required in order
  258. to evaluate results.
  259. <4> The predicted value for house price calculated by the {reganalysis}.
  260. The following example calculates the training error:
  261. [source,console]
  262. --------------------------------------------------
  263. POST _ml/data_frame/_evaluate
  264. {
  265. "index": "student_performance_mathematics_reg",
  266. "query": {
  267. "term": {
  268. "ml.is_training": {
  269. "value": true <1>
  270. }
  271. }
  272. },
  273. "evaluation": {
  274. "regression": {
  275. "actual_field": "G3", <2>
  276. "predicted_field": "ml.G3_prediction", <3>
  277. "metrics": {
  278. "r_squared": {},
  279. "mse": {},
  280. "msle": {},
  281. "huber": {}
  282. }
  283. }
  284. }
  285. }
  286. --------------------------------------------------
  287. // TEST[skip:TBD]
  288. <1> In this example, a test/train split (`training_percent`) was defined for the
  289. {reganalysis}. This query limits evaluation to be performed on the train split
  290. only. It means that a training error will be calculated.
  291. <2> The field that contains the ground truth value for the actual student
  292. performance. This is required in order to evaluate results.
  293. <3> The field that contains the predicted value for student performance
  294. calculated by the {reganalysis}.
  295. The next example calculates the testing error. The only difference compared with
  296. the previous example is that `ml.is_training` is set to `false` this time, so
  297. the query excludes the train split from the evaluation.
  298. [source,console]
  299. --------------------------------------------------
  300. POST _ml/data_frame/_evaluate
  301. {
  302. "index": "student_performance_mathematics_reg",
  303. "query": {
  304. "term": {
  305. "ml.is_training": {
  306. "value": false <1>
  307. }
  308. }
  309. },
  310. "evaluation": {
  311. "regression": {
  312. "actual_field": "G3", <2>
  313. "predicted_field": "ml.G3_prediction", <3>
  314. "metrics": {
  315. "r_squared": {},
  316. "mse": {},
  317. "msle": {},
  318. "huber": {}
  319. }
  320. }
  321. }
  322. }
  323. --------------------------------------------------
  324. // TEST[skip:TBD]
  325. <1> In this example, a test/train split (`training_percent`) was defined for the
  326. {reganalysis}. This query limits evaluation to be performed on the test split
  327. only. It means that a testing error will be calculated.
  328. <2> The field that contains the ground truth value for the actual student
  329. performance. This is required in order to evaluate results.
  330. <3> The field that contains the predicted value for student performance
  331. calculated by the {reganalysis}.
  332. [[ml-evaluate-classification-example]]
  333. === {classification-cap}
  334. [source,console]
  335. --------------------------------------------------
  336. POST _ml/data_frame/_evaluate
  337. {
  338. "index": "animal_classification",
  339. "evaluation": {
  340. "classification": { <1>
  341. "actual_field": "animal_class", <2>
  342. "predicted_field": "ml.animal_class_prediction", <3>
  343. "metrics": {
  344. "multiclass_confusion_matrix" : {} <4>
  345. }
  346. }
  347. }
  348. }
  349. --------------------------------------------------
  350. // TEST[skip:TBD]
  351. <1> The evaluation type.
  352. <2> The field that contains the ground truth value for the actual animal
  353. classification. This is required in order to evaluate results.
  354. <3> The field that contains the predicted value for animal classification by
  355. the {classanalysis}.
  356. <4> Specifies the metric for the evaluation.
  357. The API returns the following result:
  358. [source,console-result]
  359. --------------------------------------------------
  360. {
  361. "classification" : {
  362. "multiclass_confusion_matrix" : {
  363. "confusion_matrix" : [
  364. {
  365. "actual_class" : "cat", <1>
  366. "actual_class_doc_count" : 12, <2>
  367. "predicted_classes" : [ <3>
  368. {
  369. "predicted_class" : "cat",
  370. "count" : 12 <4>
  371. },
  372. {
  373. "predicted_class" : "dog",
  374. "count" : 0 <5>
  375. }
  376. ],
  377. "other_predicted_class_doc_count" : 0 <6>
  378. },
  379. {
  380. "actual_class" : "dog",
  381. "actual_class_doc_count" : 11,
  382. "predicted_classes" : [
  383. {
  384. "predicted_class" : "dog",
  385. "count" : 7
  386. },
  387. {
  388. "predicted_class" : "cat",
  389. "count" : 4
  390. }
  391. ],
  392. "other_predicted_class_doc_count" : 0
  393. }
  394. ],
  395. "other_actual_class_count" : 0
  396. }
  397. }
  398. }
  399. --------------------------------------------------
  400. <1> The name of the actual class that the analysis tried to predict.
  401. <2> The number of documents in the index that belong to the `actual_class`.
  402. <3> This object contains the list of the predicted classes and the number of
  403. predictions associated with the class.
  404. <4> The number of cats in the dataset that are correctly identified as cats.
  405. <5> The number of cats in the dataset that are incorrectly classified as dogs.
  406. <6> The number of documents that are classified as a class that is not listed as
  407. a `predicted_class`.