evaluate-dfanalytics.asciidoc 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[evaluate-dfanalytics]]
  4. = Evaluate {dfanalytics} API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Evaluate {dfanalytics}</titleabbrev>
  8. ++++
  9. Evaluates the {dfanalytics} for an annotated index.
  10. [[ml-evaluate-dfanalytics-request]]
  11. == {api-request-title}
  12. `POST _ml/data_frame/_evaluate`
  13. [[ml-evaluate-dfanalytics-prereq]]
  14. == {api-prereq-title}
  15. If the {es} {security-features} are enabled, you must have the following
  16. privileges:
  17. * cluster: `monitor_ml`
  18. For more information, see <<security-privileges>> and
  19. {ml-docs-setup-privileges}.
  20. [[ml-evaluate-dfanalytics-desc]]
  21. == {api-description-title}
  22. The API packages together commonly used evaluation metrics for various types of
  23. machine learning features. This has been designed for use on indexes created by
  24. {dfanalytics}. Evaluation requires both a ground truth field and an analytics
  25. result field to be present.
  26. [[ml-evaluate-dfanalytics-request-body]]
  27. == {api-request-body-title}
  28. `evaluation`::
  29. (Required, object) Defines the type of evaluation you want to perform.
  30. See <<ml-evaluate-dfanalytics-resources>>.
  31. +
  32. --
  33. Available evaluation types:
  34. * `outlier_detection`
  35. * `regression`
  36. * `classification`
  37. --
  38. `index`::
  39. (Required, object) Defines the `index` in which the evaluation will be
  40. performed.
  41. `query`::
  42. (Optional, object) A query clause that retrieves a subset of data from the
  43. source index. See <<query-dsl>>.
  44. [[ml-evaluate-dfanalytics-resources]]
  45. == {dfanalytics-cap} evaluation resources
  46. [[oldetection-resources]]
  47. === {oldetection-cap} evaluation objects
  48. {oldetection-cap} evaluates the results of an {oldetection} analysis which
  49. outputs the probability that each document is an outlier.
  50. `actual_field`::
  51. (Required, string) The field of the `index` which contains the `ground truth`.
  52. The data type of this field can be boolean or integer. If the data type is
  53. integer, the value has to be either `0` (false) or `1` (true).
  54. `predicted_probability_field`::
  55. (Required, string) The field of the `index` that defines the probability of
  56. whether the item belongs to the class in question or not. It's the field that
  57. contains the results of the analysis.
  58. `metrics`::
  59. (Optional, object) Specifies the metrics that are used for the evaluation. If
  60. no metrics are specified, the following are returned by default:
  61. * `auc_roc` (`include_curve`: false),
  62. * `precision` (`at`: [0.25, 0.5, 0.75]),
  63. * `recall` (`at`: [0.25, 0.5, 0.75]),
  64. * `confusion_matrix` (`at`: [0.25, 0.5, 0.75]).
  65. `auc_roc`:::
  66. (Optional, object) The AUC ROC (area under the curve of the receiver
  67. operating characteristic) score and optionally the curve. Default value is
  68. {"include_curve": false}.
  69. `confusion_matrix`:::
  70. (Optional, object) Set the different thresholds of the {olscore} at where
  71. the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
  72. negative, `fn` - false negative) are calculated. Default value is
  73. {"at": [0.25, 0.50, 0.75]}.
  74. `precision`:::
  75. (Optional, object) Set the different thresholds of the {olscore} at where
  76. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  77. `recall`:::
  78. (Optional, object) Set the different thresholds of the {olscore} at where
  79. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  80. [[regression-evaluation-resources]]
  81. === {regression-cap} evaluation objects
  82. {regression-cap} evaluation evaluates the results of a {regression} analysis
  83. which outputs a prediction of values.
  84. `actual_field`::
  85. (Required, string) The field of the `index` which contains the `ground truth`.
  86. The data type of this field must be numerical.
  87. `predicted_field`::
  88. (Required, string) The field in the `index` that contains the predicted value,
  89. in other words the results of the {regression} analysis.
  90. `metrics`::
  91. (Optional, object) Specifies the metrics that are used for the evaluation. For
  92. more information on `mse`, `msle`, and `huber`, consult
  93. https://github.com/elastic/examples/tree/master/Machine%20Learning/Regression%20Loss%20Functions[the Jupyter notebook on regression loss functions].
  94. If no metrics are specified, the following are returned by default:
  95. * `mse`,
  96. * `r_squared`,
  97. * `huber` (`delta`: 1.0).
  98. `mse`:::
  99. (Optional, object) Average squared difference between the predicted values
  100. and the actual (`ground truth`) value. For more information, read
  101. {wikipedia}/Mean_squared_error[this wiki article].
  102. `msle`:::
  103. (Optional, object) Average squared difference between the logarithm of the
  104. predicted values and the logarithm of the actual (`ground truth`) value.
  105. `offset`::::
  106. (Optional, double) Defines the transition point at which you switch from
  107. minimizing quadratic error to minimizing quadratic log error. Defaults to
  108. `1`.
  109. `huber`:::
  110. (Optional, object) Pseudo Huber loss function. For more information, read
  111. {wikipedia}/Huber_loss#Pseudo-Huber_loss_function[this wiki article].
  112. `delta`::::
  113. (Optional, double) Approximates 1/2 (prediction - actual)^2^ for values
  114. much less than delta and approximates a straight line with slope delta for
  115. values much larger than delta. Defaults to `1`. Delta needs to be greater
  116. than `0`.
  117. `r_squared`:::
  118. (Optional, object) Proportion of the variance in the dependent variable that
  119. is predictable from the independent variables. For more information, read
  120. {wikipedia}/Coefficient_of_determination[this wiki article].
  121. [[classification-evaluation-resources]]
  122. == {classification-cap} evaluation objects
  123. {classification-cap} evaluation evaluates the results of a {classanalysis} which
  124. outputs a prediction that identifies to which of the classes each document
  125. belongs.
  126. `actual_field`::
  127. (Required, string) The field of the `index` which contains the `ground truth`.
  128. The data type of this field must be categorical.
  129. `predicted_field`::
  130. (Optional, string) The field in the `index` which contains the predicted value,
  131. in other words the results of the {classanalysis}.
  132. `top_classes_field`::
  133. (Optional, string) The field of the `index` which is an array of documents
  134. of the form `{ "class_name": XXX, "class_probability": YYY }`.
  135. This field must be defined as `nested` in the mappings.
  136. `metrics`::
  137. (Optional, object) Specifies the metrics that are used for the evaluation. If
  138. no metrics are specificed, the following are returned by default:
  139. * `accuracy`,
  140. * `multiclass_confusion_matrix`,
  141. * `precision`,
  142. * `recall`.
  143. `accuracy`:::
  144. (Optional, object) Accuracy of predictions (per-class and overall).
  145. `auc_roc`:::
  146. (Optional, object) The AUC ROC (area under the curve of the receiver
  147. operating characteristic) score and optionally the curve.
  148. It is calculated for a specific class (provided as "class_name") treated as
  149. positive.
  150. `class_name`::::
  151. (Required, string) Name of the only class that is treated as positive
  152. during AUC ROC calculation. Other classes are treated as negative
  153. ("one-vs-all" strategy). All the evaluated documents must have
  154. `class_name` in the list of their top classes.
  155. `include_curve`::::
  156. (Optional, Boolean) Whether or not the curve should be returned in
  157. addition to the score. Default value is false.
  158. `multiclass_confusion_matrix`:::
  159. (Optional, object) Multiclass confusion matrix.
  160. `precision`:::
  161. (Optional, object) Precision of predictions (per-class and average).
  162. `recall`:::
  163. (Optional, object) Recall of predictions (per-class and average).
  164. ////
  165. [[ml-evaluate-dfanalytics-results]]
  166. == {api-response-body-title}
  167. `outlier_detection`::
  168. (object) If you chose to do outlier detection, the API returns the
  169. following evaluation metrics:
  170. `auc_roc`::: TBD
  171. `confusion_matrix`::: TBD
  172. `precision`::: TBD
  173. `recall`::: TBD
  174. ////
  175. [[ml-evaluate-dfanalytics-example]]
  176. == {api-examples-title}
  177. [[ml-evaluate-oldetection-example]]
  178. === {oldetection-cap}
  179. [source,console]
  180. --------------------------------------------------
  181. POST _ml/data_frame/_evaluate
  182. {
  183. "index": "my_analytics_dest_index",
  184. "evaluation": {
  185. "outlier_detection": {
  186. "actual_field": "is_outlier",
  187. "predicted_probability_field": "ml.outlier_score"
  188. }
  189. }
  190. }
  191. --------------------------------------------------
  192. // TEST[skip:TBD]
  193. The API returns the following results:
  194. [source,console-result]
  195. ----
  196. {
  197. "outlier_detection": {
  198. "auc_roc": {
  199. "value": 0.92584757746414444
  200. },
  201. "confusion_matrix": {
  202. "0.25": {
  203. "tp": 5,
  204. "fp": 9,
  205. "tn": 204,
  206. "fn": 5
  207. },
  208. "0.5": {
  209. "tp": 1,
  210. "fp": 5,
  211. "tn": 208,
  212. "fn": 9
  213. },
  214. "0.75": {
  215. "tp": 0,
  216. "fp": 4,
  217. "tn": 209,
  218. "fn": 10
  219. }
  220. },
  221. "precision": {
  222. "0.25": 0.35714285714285715,
  223. "0.5": 0.16666666666666666,
  224. "0.75": 0
  225. },
  226. "recall": {
  227. "0.25": 0.5,
  228. "0.5": 0.1,
  229. "0.75": 0
  230. }
  231. }
  232. }
  233. ----
  234. [[ml-evaluate-regression-example]]
  235. === {regression-cap}
  236. [source,console]
  237. --------------------------------------------------
  238. POST _ml/data_frame/_evaluate
  239. {
  240. "index": "house_price_predictions", <1>
  241. "query": {
  242. "bool": {
  243. "filter": [
  244. { "term": { "ml.is_training": false } } <2>
  245. ]
  246. }
  247. },
  248. "evaluation": {
  249. "regression": {
  250. "actual_field": "price", <3>
  251. "predicted_field": "ml.price_prediction", <4>
  252. "metrics": {
  253. "r_squared": {},
  254. "mse": {},
  255. "msle": {"offset": 10},
  256. "huber": {"delta": 1.5}
  257. }
  258. }
  259. }
  260. }
  261. --------------------------------------------------
  262. // TEST[skip:TBD]
  263. <1> The output destination index from a {dfanalytics} {reganalysis}.
  264. <2> In this example, a test/train split (`training_percent`) was defined for the
  265. {reganalysis}. This query limits evaluation to be performed on the test split
  266. only.
  267. <3> The ground truth value for the actual house price. This is required in order
  268. to evaluate results.
  269. <4> The predicted value for house price calculated by the {reganalysis}.
  270. The following example calculates the training error:
  271. [source,console]
  272. --------------------------------------------------
  273. POST _ml/data_frame/_evaluate
  274. {
  275. "index": "student_performance_mathematics_reg",
  276. "query": {
  277. "term": {
  278. "ml.is_training": {
  279. "value": true <1>
  280. }
  281. }
  282. },
  283. "evaluation": {
  284. "regression": {
  285. "actual_field": "G3", <2>
  286. "predicted_field": "ml.G3_prediction", <3>
  287. "metrics": {
  288. "r_squared": {},
  289. "mse": {},
  290. "msle": {},
  291. "huber": {}
  292. }
  293. }
  294. }
  295. }
  296. --------------------------------------------------
  297. // TEST[skip:TBD]
  298. <1> In this example, a test/train split (`training_percent`) was defined for the
  299. {reganalysis}. This query limits evaluation to be performed on the train split
  300. only. It means that a training error will be calculated.
  301. <2> The field that contains the ground truth value for the actual student
  302. performance. This is required in order to evaluate results.
  303. <3> The field that contains the predicted value for student performance
  304. calculated by the {reganalysis}.
  305. The next example calculates the testing error. The only difference compared with
  306. the previous example is that `ml.is_training` is set to `false` this time, so
  307. the query excludes the train split from the evaluation.
  308. [source,console]
  309. --------------------------------------------------
  310. POST _ml/data_frame/_evaluate
  311. {
  312. "index": "student_performance_mathematics_reg",
  313. "query": {
  314. "term": {
  315. "ml.is_training": {
  316. "value": false <1>
  317. }
  318. }
  319. },
  320. "evaluation": {
  321. "regression": {
  322. "actual_field": "G3", <2>
  323. "predicted_field": "ml.G3_prediction", <3>
  324. "metrics": {
  325. "r_squared": {},
  326. "mse": {},
  327. "msle": {},
  328. "huber": {}
  329. }
  330. }
  331. }
  332. }
  333. --------------------------------------------------
  334. // TEST[skip:TBD]
  335. <1> In this example, a test/train split (`training_percent`) was defined for the
  336. {reganalysis}. This query limits evaluation to be performed on the test split
  337. only. It means that a testing error will be calculated.
  338. <2> The field that contains the ground truth value for the actual student
  339. performance. This is required in order to evaluate results.
  340. <3> The field that contains the predicted value for student performance
  341. calculated by the {reganalysis}.
  342. [[ml-evaluate-classification-example]]
  343. === {classification-cap}
  344. [source,console]
  345. --------------------------------------------------
  346. POST _ml/data_frame/_evaluate
  347. {
  348. "index": "animal_classification",
  349. "evaluation": {
  350. "classification": { <1>
  351. "actual_field": "animal_class", <2>
  352. "predicted_field": "ml.animal_class_prediction", <3>
  353. "metrics": {
  354. "multiclass_confusion_matrix" : {} <4>
  355. }
  356. }
  357. }
  358. }
  359. --------------------------------------------------
  360. // TEST[skip:TBD]
  361. <1> The evaluation type.
  362. <2> The field that contains the ground truth value for the actual animal
  363. classification. This is required in order to evaluate results.
  364. <3> The field that contains the predicted value for animal classification by
  365. the {classanalysis}.
  366. <4> Specifies the metric for the evaluation.
  367. The API returns the following result:
  368. [source,console-result]
  369. --------------------------------------------------
  370. {
  371. "classification" : {
  372. "multiclass_confusion_matrix" : {
  373. "confusion_matrix" : [
  374. {
  375. "actual_class" : "cat", <1>
  376. "actual_class_doc_count" : 12, <2>
  377. "predicted_classes" : [ <3>
  378. {
  379. "predicted_class" : "cat",
  380. "count" : 12 <4>
  381. },
  382. {
  383. "predicted_class" : "dog",
  384. "count" : 0 <5>
  385. }
  386. ],
  387. "other_predicted_class_doc_count" : 0 <6>
  388. },
  389. {
  390. "actual_class" : "dog",
  391. "actual_class_doc_count" : 11,
  392. "predicted_classes" : [
  393. {
  394. "predicted_class" : "dog",
  395. "count" : 7
  396. },
  397. {
  398. "predicted_class" : "cat",
  399. "count" : 4
  400. }
  401. ],
  402. "other_predicted_class_doc_count" : 0
  403. }
  404. ],
  405. "other_actual_class_count" : 0
  406. }
  407. }
  408. }
  409. --------------------------------------------------
  410. <1> The name of the actual class that the analysis tried to predict.
  411. <2> The number of documents in the index that belong to the `actual_class`.
  412. <3> This object contains the list of the predicted classes and the number of
  413. predictions associated with the class.
  414. <4> The number of cats in the dataset that are correctly identified as cats.
  415. <5> The number of cats in the dataset that are incorrectly classified as dogs.
  416. <6> The number of documents that are classified as a class that is not listed as
  417. a `predicted_class`.
  418. [source,console]
  419. --------------------------------------------------
  420. POST _ml/data_frame/_evaluate
  421. {
  422. "index": "animal_classification",
  423. "evaluation": {
  424. "classification": { <1>
  425. "actual_field": "animal_class", <2>
  426. "metrics": {
  427. "auc_roc" : { <3>
  428. "class_name": "dog" <4>
  429. }
  430. }
  431. }
  432. }
  433. }
  434. --------------------------------------------------
  435. // TEST[skip:TBD]
  436. <1> The evaluation type.
  437. <2> The field that contains the ground truth value for the actual animal
  438. classification. This is required in order to evaluate results.
  439. <3> Specifies the metric for the evaluation.
  440. <4> Specifies the class name that is treated as positive during the evaluation,
  441. all the other classes are treated as negative.
  442. The API returns the following result:
  443. [source,console-result]
  444. --------------------------------------------------
  445. {
  446. "classification" : {
  447. "auc_roc" : {
  448. "value" : 0.8941788639536681
  449. }
  450. }
  451. }
  452. --------------------------------------------------
  453. // TEST[skip:TBD]