evaluate-dfanalytics.asciidoc 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551
  1. [role="xpack"]
  2. [[evaluate-dfanalytics]]
  3. = Evaluate {dfanalytics} API
  4. [subs="attributes"]
  5. ++++
  6. <titleabbrev>Evaluate {dfanalytics}</titleabbrev>
  7. ++++
  8. Evaluates the {dfanalytics} for an annotated index.
  9. [[ml-evaluate-dfanalytics-request]]
  10. == {api-request-title}
  11. `POST _ml/data_frame/_evaluate`
  12. [[ml-evaluate-dfanalytics-prereq]]
  13. == {api-prereq-title}
  14. Requires the following privileges:
  15. * cluster: `monitor_ml` (the `machine_learning_user` built-in role grants this
  16. privilege)
  17. * destination index: `read`
  18. [[ml-evaluate-dfanalytics-desc]]
  19. == {api-description-title}
  20. The API packages together commonly used evaluation metrics for various types of
  21. machine learning features. This has been designed for use on indexes created by
  22. {dfanalytics}. Evaluation requires both a ground truth field and an analytics
  23. result field to be present.
  24. [[ml-evaluate-dfanalytics-request-body]]
  25. == {api-request-body-title}
  26. `evaluation`::
  27. (Required, object) Defines the type of evaluation you want to perform.
  28. See <<ml-evaluate-dfanalytics-resources>>.
  29. +
  30. --
  31. Available evaluation types:
  32. * `outlier_detection`
  33. * `regression`
  34. * `classification`
  35. --
  36. `index`::
  37. (Required, object) Defines the `index` in which the evaluation will be
  38. performed.
  39. `query`::
  40. (Optional, object) A query clause that retrieves a subset of data from the
  41. source index. See <<query-dsl>>.
  42. [[ml-evaluate-dfanalytics-resources]]
  43. == {dfanalytics-cap} evaluation resources
  44. [[oldetection-resources]]
  45. === {oldetection-cap} evaluation objects
  46. {oldetection-cap} evaluates the results of an {oldetection} analysis which
  47. outputs the probability that each document is an outlier.
  48. `actual_field`::
  49. (Required, string) The field of the `index` which contains the `ground truth`.
  50. The data type of this field can be boolean or integer. If the data type is
  51. integer, the value has to be either `0` (false) or `1` (true).
  52. `predicted_probability_field`::
  53. (Required, string) The field of the `index` that defines the probability of
  54. whether the item belongs to the class in question or not. It's the field that
  55. contains the results of the analysis.
  56. `metrics`::
  57. (Optional, object) Specifies the metrics that are used for the evaluation. If
  58. no metrics are specified, the following are returned by default:
  59. * `auc_roc` (`include_curve`: false),
  60. * `precision` (`at`: [0.25, 0.5, 0.75]),
  61. * `recall` (`at`: [0.25, 0.5, 0.75]),
  62. * `confusion_matrix` (`at`: [0.25, 0.5, 0.75]).
  63. `auc_roc`:::
  64. (Optional, object) The AUC ROC (area under the curve of the receiver
  65. operating characteristic) score and optionally the curve. Default value is
  66. {"include_curve": false}.
  67. `confusion_matrix`:::
  68. (Optional, object) Set the different thresholds of the {olscore} at where
  69. the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
  70. negative, `fn` - false negative) are calculated. Default value is
  71. {"at": [0.25, 0.50, 0.75]}.
  72. `precision`:::
  73. (Optional, object) Set the different thresholds of the {olscore} at where
  74. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  75. `recall`:::
  76. (Optional, object) Set the different thresholds of the {olscore} at where
  77. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  78. [[regression-evaluation-resources]]
  79. === {regression-cap} evaluation objects
  80. {regression-cap} evaluation evaluates the results of a {regression} analysis
  81. which outputs a prediction of values.
  82. `actual_field`::
  83. (Required, string) The field of the `index` which contains the `ground truth`.
  84. The data type of this field must be numerical.
  85. `predicted_field`::
  86. (Required, string) The field in the `index` that contains the predicted value,
  87. in other words the results of the {regression} analysis.
  88. `metrics`::
  89. (Optional, object) Specifies the metrics that are used for the evaluation. For
  90. more information on `mse`, `msle`, and `huber`, consult
  91. https://github.com/elastic/examples/tree/master/Machine%20Learning/Regression%20Loss%20Functions[the Jupyter notebook on regression loss functions].
  92. If no metrics are specified, the following are returned by default:
  93. * `mse`,
  94. * `r_squared`,
  95. * `huber` (`delta`: 1.0).
  96. `mse`:::
  97. (Optional, object) Average squared difference between the predicted values
  98. and the actual (`ground truth`) value. For more information, read
  99. {wikipedia}/Mean_squared_error[this wiki article].
  100. `msle`:::
  101. (Optional, object) Average squared difference between the logarithm of the
  102. predicted values and the logarithm of the actual (`ground truth`) value.
  103. `offset`::::
  104. (Optional, double) Defines the transition point at which you switch from
  105. minimizing quadratic error to minimizing quadratic log error. Defaults to
  106. `1`.
  107. `huber`:::
  108. (Optional, object) Pseudo Huber loss function. For more information, read
  109. {wikipedia}/Huber_loss#Pseudo-Huber_loss_function[this wiki article].
  110. `delta`::::
  111. (Optional, double) Approximates 1/2 (prediction - actual)^2^ for values
  112. much less than delta and approximates a straight line with slope delta for
  113. values much larger than delta. Defaults to `1`. Delta needs to be greater
  114. than `0`.
  115. `r_squared`:::
  116. (Optional, object) Proportion of the variance in the dependent variable that
  117. is predictable from the independent variables. For more information, read
  118. {wikipedia}/Coefficient_of_determination[this wiki article].
  119. [[classification-evaluation-resources]]
  120. == {classification-cap} evaluation objects
  121. {classification-cap} evaluation evaluates the results of a {classanalysis} which
  122. outputs a prediction that identifies to which of the classes each document
  123. belongs.
  124. `actual_field`::
  125. (Required, string) The field of the `index` which contains the `ground truth`.
  126. The data type of this field must be categorical.
  127. `predicted_field`::
  128. (Optional, string) The field in the `index` which contains the predicted value,
  129. in other words the results of the {classanalysis}.
  130. `top_classes_field`::
  131. (Optional, string) The field of the `index` which is an array of documents
  132. of the form `{ "class_name": XXX, "class_probability": YYY }`.
  133. This field must be defined as `nested` in the mappings.
  134. `metrics`::
  135. (Optional, object) Specifies the metrics that are used for the evaluation. If
  136. no metrics are specificed, the following are returned by default:
  137. * `accuracy`,
  138. * `multiclass_confusion_matrix`,
  139. * `precision`,
  140. * `recall`.
  141. `accuracy`:::
  142. (Optional, object) Accuracy of predictions (per-class and overall).
  143. `auc_roc`:::
  144. (Optional, object) The AUC ROC (area under the curve of the receiver
  145. operating characteristic) score and optionally the curve.
  146. It is calculated for a specific class (provided as "class_name") treated as
  147. positive.
  148. `class_name`::::
  149. (Required, string) Name of the only class that is treated as positive
  150. during AUC ROC calculation. Other classes are treated as negative
  151. ("one-vs-all" strategy). All the evaluated documents must have
  152. `class_name` in the list of their top classes.
  153. `include_curve`::::
  154. (Optional, Boolean) Whether or not the curve should be returned in
  155. addition to the score. Default value is false.
  156. `multiclass_confusion_matrix`:::
  157. (Optional, object) Multiclass confusion matrix.
  158. `precision`:::
  159. (Optional, object) Precision of predictions (per-class and average).
  160. `recall`:::
  161. (Optional, object) Recall of predictions (per-class and average).
  162. ////
  163. [[ml-evaluate-dfanalytics-results]]
  164. == {api-response-body-title}
  165. `outlier_detection`::
  166. (object) If you chose to do outlier detection, the API returns the
  167. following evaluation metrics:
  168. `auc_roc`::: TBD
  169. `confusion_matrix`::: TBD
  170. `precision`::: TBD
  171. `recall`::: TBD
  172. ////
  173. [[ml-evaluate-dfanalytics-example]]
  174. == {api-examples-title}
  175. [[ml-evaluate-oldetection-example]]
  176. === {oldetection-cap}
  177. [source,console]
  178. --------------------------------------------------
  179. POST _ml/data_frame/_evaluate
  180. {
  181. "index": "my_analytics_dest_index",
  182. "evaluation": {
  183. "outlier_detection": {
  184. "actual_field": "is_outlier",
  185. "predicted_probability_field": "ml.outlier_score"
  186. }
  187. }
  188. }
  189. --------------------------------------------------
  190. // TEST[skip:TBD]
  191. The API returns the following results:
  192. [source,console-result]
  193. ----
  194. {
  195. "outlier_detection": {
  196. "auc_roc": {
  197. "value": 0.92584757746414444
  198. },
  199. "confusion_matrix": {
  200. "0.25": {
  201. "tp": 5,
  202. "fp": 9,
  203. "tn": 204,
  204. "fn": 5
  205. },
  206. "0.5": {
  207. "tp": 1,
  208. "fp": 5,
  209. "tn": 208,
  210. "fn": 9
  211. },
  212. "0.75": {
  213. "tp": 0,
  214. "fp": 4,
  215. "tn": 209,
  216. "fn": 10
  217. }
  218. },
  219. "precision": {
  220. "0.25": 0.35714285714285715,
  221. "0.5": 0.16666666666666666,
  222. "0.75": 0
  223. },
  224. "recall": {
  225. "0.25": 0.5,
  226. "0.5": 0.1,
  227. "0.75": 0
  228. }
  229. }
  230. }
  231. ----
  232. [[ml-evaluate-regression-example]]
  233. === {regression-cap}
  234. [source,console]
  235. --------------------------------------------------
  236. POST _ml/data_frame/_evaluate
  237. {
  238. "index": "house_price_predictions", <1>
  239. "query": {
  240. "bool": {
  241. "filter": [
  242. { "term": { "ml.is_training": false } } <2>
  243. ]
  244. }
  245. },
  246. "evaluation": {
  247. "regression": {
  248. "actual_field": "price", <3>
  249. "predicted_field": "ml.price_prediction", <4>
  250. "metrics": {
  251. "r_squared": {},
  252. "mse": {},
  253. "msle": {"offset": 10},
  254. "huber": {"delta": 1.5}
  255. }
  256. }
  257. }
  258. }
  259. --------------------------------------------------
  260. // TEST[skip:TBD]
  261. <1> The output destination index from a {dfanalytics} {reganalysis}.
  262. <2> In this example, a test/train split (`training_percent`) was defined for the
  263. {reganalysis}. This query limits evaluation to be performed on the test split
  264. only.
  265. <3> The ground truth value for the actual house price. This is required in order
  266. to evaluate results.
  267. <4> The predicted value for house price calculated by the {reganalysis}.
  268. The following example calculates the training error:
  269. [source,console]
  270. --------------------------------------------------
  271. POST _ml/data_frame/_evaluate
  272. {
  273. "index": "student_performance_mathematics_reg",
  274. "query": {
  275. "term": {
  276. "ml.is_training": {
  277. "value": true <1>
  278. }
  279. }
  280. },
  281. "evaluation": {
  282. "regression": {
  283. "actual_field": "G3", <2>
  284. "predicted_field": "ml.G3_prediction", <3>
  285. "metrics": {
  286. "r_squared": {},
  287. "mse": {},
  288. "msle": {},
  289. "huber": {}
  290. }
  291. }
  292. }
  293. }
  294. --------------------------------------------------
  295. // TEST[skip:TBD]
  296. <1> In this example, a test/train split (`training_percent`) was defined for the
  297. {reganalysis}. This query limits evaluation to be performed on the train split
  298. only. It means that a training error will be calculated.
  299. <2> The field that contains the ground truth value for the actual student
  300. performance. This is required in order to evaluate results.
  301. <3> The field that contains the predicted value for student performance
  302. calculated by the {reganalysis}.
  303. The next example calculates the testing error. The only difference compared with
  304. the previous example is that `ml.is_training` is set to `false` this time, so
  305. the query excludes the train split from the evaluation.
  306. [source,console]
  307. --------------------------------------------------
  308. POST _ml/data_frame/_evaluate
  309. {
  310. "index": "student_performance_mathematics_reg",
  311. "query": {
  312. "term": {
  313. "ml.is_training": {
  314. "value": false <1>
  315. }
  316. }
  317. },
  318. "evaluation": {
  319. "regression": {
  320. "actual_field": "G3", <2>
  321. "predicted_field": "ml.G3_prediction", <3>
  322. "metrics": {
  323. "r_squared": {},
  324. "mse": {},
  325. "msle": {},
  326. "huber": {}
  327. }
  328. }
  329. }
  330. }
  331. --------------------------------------------------
  332. // TEST[skip:TBD]
  333. <1> In this example, a test/train split (`training_percent`) was defined for the
  334. {reganalysis}. This query limits evaluation to be performed on the test split
  335. only. It means that a testing error will be calculated.
  336. <2> The field that contains the ground truth value for the actual student
  337. performance. This is required in order to evaluate results.
  338. <3> The field that contains the predicted value for student performance
  339. calculated by the {reganalysis}.
  340. [[ml-evaluate-classification-example]]
  341. === {classification-cap}
  342. [source,console]
  343. --------------------------------------------------
  344. POST _ml/data_frame/_evaluate
  345. {
  346. "index": "animal_classification",
  347. "evaluation": {
  348. "classification": { <1>
  349. "actual_field": "animal_class", <2>
  350. "predicted_field": "ml.animal_class_prediction", <3>
  351. "metrics": {
  352. "multiclass_confusion_matrix" : {} <4>
  353. }
  354. }
  355. }
  356. }
  357. --------------------------------------------------
  358. // TEST[skip:TBD]
  359. <1> The evaluation type.
  360. <2> The field that contains the ground truth value for the actual animal
  361. classification. This is required in order to evaluate results.
  362. <3> The field that contains the predicted value for animal classification by
  363. the {classanalysis}.
  364. <4> Specifies the metric for the evaluation.
  365. The API returns the following result:
  366. [source,console-result]
  367. --------------------------------------------------
  368. {
  369. "classification" : {
  370. "multiclass_confusion_matrix" : {
  371. "confusion_matrix" : [
  372. {
  373. "actual_class" : "cat", <1>
  374. "actual_class_doc_count" : 12, <2>
  375. "predicted_classes" : [ <3>
  376. {
  377. "predicted_class" : "cat",
  378. "count" : 12 <4>
  379. },
  380. {
  381. "predicted_class" : "dog",
  382. "count" : 0 <5>
  383. }
  384. ],
  385. "other_predicted_class_doc_count" : 0 <6>
  386. },
  387. {
  388. "actual_class" : "dog",
  389. "actual_class_doc_count" : 11,
  390. "predicted_classes" : [
  391. {
  392. "predicted_class" : "dog",
  393. "count" : 7
  394. },
  395. {
  396. "predicted_class" : "cat",
  397. "count" : 4
  398. }
  399. ],
  400. "other_predicted_class_doc_count" : 0
  401. }
  402. ],
  403. "other_actual_class_count" : 0
  404. }
  405. }
  406. }
  407. --------------------------------------------------
  408. <1> The name of the actual class that the analysis tried to predict.
  409. <2> The number of documents in the index that belong to the `actual_class`.
  410. <3> This object contains the list of the predicted classes and the number of
  411. predictions associated with the class.
  412. <4> The number of cats in the dataset that are correctly identified as cats.
  413. <5> The number of cats in the dataset that are incorrectly classified as dogs.
  414. <6> The number of documents that are classified as a class that is not listed as
  415. a `predicted_class`.
  416. [source,console]
  417. --------------------------------------------------
  418. POST _ml/data_frame/_evaluate
  419. {
  420. "index": "animal_classification",
  421. "evaluation": {
  422. "classification": { <1>
  423. "actual_field": "animal_class", <2>
  424. "metrics": {
  425. "auc_roc" : { <3>
  426. "class_name": "dog" <4>
  427. }
  428. }
  429. }
  430. }
  431. }
  432. --------------------------------------------------
  433. // TEST[skip:TBD]
  434. <1> The evaluation type.
  435. <2> The field that contains the ground truth value for the actual animal
  436. classification. This is required in order to evaluate results.
  437. <3> Specifies the metric for the evaluation.
  438. <4> Specifies the class name that is treated as positive during the evaluation,
  439. all the other classes are treated as negative.
  440. The API returns the following result:
  441. [source,console-result]
  442. --------------------------------------------------
  443. {
  444. "classification" : {
  445. "auc_roc" : {
  446. "value" : 0.8941788639536681
  447. }
  448. }
  449. }
  450. --------------------------------------------------
  451. // TEST[skip:TBD]