evaluate-dfanalytics.asciidoc 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555
  1. [role="xpack"]
  2. [[evaluate-dfanalytics]]
  3. = Evaluate {dfanalytics} API
  4. [subs="attributes"]
  5. ++++
  6. <titleabbrev>Evaluate {dfanalytics}</titleabbrev>
  7. ++++
  8. Evaluates the {dfanalytics} for an annotated index.
  9. [[ml-evaluate-dfanalytics-request]]
  10. == {api-request-title}
  11. `POST _ml/data_frame/_evaluate`
  12. [[ml-evaluate-dfanalytics-prereq]]
  13. == {api-prereq-title}
  14. Requires the following privileges:
  15. * cluster: `monitor_ml` (the `machine_learning_user` built-in role grants this
  16. privilege)
  17. * destination index: `read`
  18. [[ml-evaluate-dfanalytics-desc]]
  19. == {api-description-title}
  20. The API packages together commonly used evaluation metrics for various types of
  21. machine learning features. This has been designed for use on indexes created by
  22. {dfanalytics}. Evaluation requires both a ground truth field and an analytics
  23. result field to be present.
  24. [[ml-evaluate-dfanalytics-request-body]]
  25. == {api-request-body-title}
  26. `evaluation`::
  27. (Required, object) Defines the type of evaluation you want to perform.
  28. See <<ml-evaluate-dfanalytics-resources>>.
  29. +
  30. --
  31. Available evaluation types:
  32. * `outlier_detection`
  33. * `regression`
  34. * `classification`
  35. --
  36. `index`::
  37. (Required, object) Defines the `index` in which the evaluation will be
  38. performed.
  39. `query`::
  40. (Optional, object) A query clause that retrieves a subset of data from the
  41. source index. See <<query-dsl>>.
  42. [[ml-evaluate-dfanalytics-resources]]
  43. == {dfanalytics-cap} evaluation resources
  44. [[oldetection-resources]]
  45. === {oldetection-cap} evaluation objects
  46. {oldetection-cap} evaluates the results of an {oldetection} analysis which
  47. outputs the probability that each document is an outlier.
  48. `actual_field`::
  49. (Required, string) The field of the `index` which contains the `ground truth`.
  50. The data type of this field can be boolean or integer. If the data type is
  51. integer, the value has to be either `0` (false) or `1` (true).
  52. `predicted_probability_field`::
  53. (Required, string) The field of the `index` that defines the probability of
  54. whether the item belongs to the class in question or not. It's the field that
  55. contains the results of the analysis.
  56. `metrics`::
  57. (Optional, object) Specifies the metrics that are used for the evaluation. If
  58. no metrics are specified, the following are returned by default:
  59. * `auc_roc` (`include_curve`: false),
  60. * `precision` (`at`: [0.25, 0.5, 0.75]),
  61. * `recall` (`at`: [0.25, 0.5, 0.75]),
  62. * `confusion_matrix` (`at`: [0.25, 0.5, 0.75]).
  63. `auc_roc`:::
  64. (Optional, object) The AUC ROC (area under the curve of the receiver
  65. operating characteristic) score and optionally the curve. Default value is
  66. {"include_curve": false}.
  67. `confusion_matrix`:::
  68. (Optional, object) Set the different thresholds of the {olscore} at where
  69. the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
  70. negative, `fn` - false negative) are calculated. Default value is
  71. {"at": [0.25, 0.50, 0.75]}.
  72. `precision`:::
  73. (Optional, object) Set the different thresholds of the {olscore} at where
  74. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  75. `recall`:::
  76. (Optional, object) Set the different thresholds of the {olscore} at where
  77. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  78. [[regression-evaluation-resources]]
  79. === {regression-cap} evaluation objects
  80. {regression-cap} evaluation evaluates the results of a {regression} analysis
  81. which outputs a prediction of values.
  82. `actual_field`::
  83. (Required, string) The field of the `index` which contains the `ground truth`.
  84. The data type of this field must be numerical.
  85. `predicted_field`::
  86. (Required, string) The field in the `index` that contains the predicted value,
  87. in other words the results of the {regression} analysis.
  88. `metrics`::
  89. (Optional, object) Specifies the metrics that are used for the evaluation. For
  90. more information on `mse`, `msle`, and `huber`, consult
  91. https://github.com/elastic/examples/tree/master/Machine%20Learning/Regression%20Loss%20Functions[the Jupyter notebook on regression loss functions].
  92. If no metrics are specified, the following are returned by default:
  93. * `mse`,
  94. * `r_squared`,
  95. * `huber` (`delta`: 1.0).
  96. `mse`:::
  97. (Optional, object) Average squared difference between the predicted values
  98. and the actual (`ground truth`) value. For more information, read
  99. {wikipedia}/Mean_squared_error[this wiki article].
  100. `msle`:::
  101. (Optional, object) Average squared difference between the logarithm of the
  102. predicted values and the logarithm of the actual (`ground truth`) value.
  103. `offset`::::
  104. (Optional, double) Defines the transition point at which you switch from
  105. minimizing quadratic error to minimizing quadratic log error. Defaults to
  106. `1`.
  107. `huber`:::
  108. (Optional, object) Pseudo Huber loss function. For more information, read
  109. {wikipedia}/Huber_loss#Pseudo-Huber_loss_function[this wiki article].
  110. `delta`::::
  111. (Optional, double) Approximates 1/2 (prediction - actual)^2^ for values
  112. much less than delta and approximates a straight line with slope delta for
  113. values much larger than delta. Defaults to `1`. Delta needs to be greater
  114. than `0`.
  115. `r_squared`:::
  116. (Optional, object) Proportion of the variance in the dependent variable that
  117. is predictable from the independent variables. For more information, read
  118. {wikipedia}/Coefficient_of_determination[this wiki article].
  119. [[classification-evaluation-resources]]
  120. == {classification-cap} evaluation objects
  121. {classification-cap} evaluation evaluates the results of a {classanalysis} which
  122. outputs a prediction that identifies to which of the classes each document
  123. belongs.
  124. `actual_field`::
  125. (Required, string) The field of the `index` which contains the `ground truth`.
  126. The data type of this field must be categorical.
  127. `predicted_field`::
  128. (Optional, string) The field in the `index` which contains the predicted value,
  129. in other words the results of the {classanalysis}.
  130. `top_classes_field`::
  131. (Optional, string) The field of the `index` which is an array of documents
  132. of the form `{ "class_name": XXX, "class_probability": YYY }`.
  133. This field must be defined as `nested` in the mappings.
  134. `metrics`::
  135. (Optional, object) Specifies the metrics that are used for the evaluation. If
  136. no metrics are specificed, the following are returned by default:
  137. * `accuracy`,
  138. * `multiclass_confusion_matrix`,
  139. * `precision`,
  140. * `recall`.
  141. `accuracy`:::
  142. (Optional, object) Accuracy of predictions (per-class and overall).
  143. `auc_roc`:::
  144. (Optional, object) The AUC ROC (area under the curve of the receiver
  145. operating characteristic) score and optionally the curve.
  146. It is calculated for a specific class (provided as "class_name") treated as
  147. positive.
  148. `class_name`::::
  149. (Required, string) Name of the only class that is treated as positive
  150. during AUC ROC calculation. Other classes are treated as negative
  151. ("one-vs-all" strategy). All the evaluated documents must have
  152. `class_name` in the list of their top classes.
  153. `include_curve`::::
  154. (Optional, Boolean) Whether or not the curve should be returned in
  155. addition to the score. Default value is false.
  156. `multiclass_confusion_matrix`:::
  157. (Optional, object) Multiclass confusion matrix.
  158. `size`::::
  159. (Optional, double) Specifies the size of the multiclass confusion matrix.
  160. Defaults to `10` which results in a matrix of size 10x10.
  161. `precision`:::
  162. (Optional, object) Precision of predictions (per-class and average).
  163. `recall`:::
  164. (Optional, object) Recall of predictions (per-class and average).
  165. ////
  166. [[ml-evaluate-dfanalytics-results]]
  167. == {api-response-body-title}
  168. `outlier_detection`::
  169. (object) If you chose to do outlier detection, the API returns the
  170. following evaluation metrics:
  171. `auc_roc`::: TBD
  172. `confusion_matrix`::: TBD
  173. `precision`::: TBD
  174. `recall`::: TBD
  175. ////
  176. [[ml-evaluate-dfanalytics-example]]
  177. == {api-examples-title}
  178. [[ml-evaluate-oldetection-example]]
  179. === {oldetection-cap}
  180. [source,console]
  181. --------------------------------------------------
  182. POST _ml/data_frame/_evaluate
  183. {
  184. "index": "my_analytics_dest_index",
  185. "evaluation": {
  186. "outlier_detection": {
  187. "actual_field": "is_outlier",
  188. "predicted_probability_field": "ml.outlier_score"
  189. }
  190. }
  191. }
  192. --------------------------------------------------
  193. // TEST[skip:TBD]
  194. The API returns the following results:
  195. [source,console-result]
  196. ----
  197. {
  198. "outlier_detection": {
  199. "auc_roc": {
  200. "value": 0.92584757746414444
  201. },
  202. "confusion_matrix": {
  203. "0.25": {
  204. "tp": 5,
  205. "fp": 9,
  206. "tn": 204,
  207. "fn": 5
  208. },
  209. "0.5": {
  210. "tp": 1,
  211. "fp": 5,
  212. "tn": 208,
  213. "fn": 9
  214. },
  215. "0.75": {
  216. "tp": 0,
  217. "fp": 4,
  218. "tn": 209,
  219. "fn": 10
  220. }
  221. },
  222. "precision": {
  223. "0.25": 0.35714285714285715,
  224. "0.5": 0.16666666666666666,
  225. "0.75": 0
  226. },
  227. "recall": {
  228. "0.25": 0.5,
  229. "0.5": 0.1,
  230. "0.75": 0
  231. }
  232. }
  233. }
  234. ----
  235. [[ml-evaluate-regression-example]]
  236. === {regression-cap}
  237. [source,console]
  238. --------------------------------------------------
  239. POST _ml/data_frame/_evaluate
  240. {
  241. "index": "house_price_predictions", <1>
  242. "query": {
  243. "bool": {
  244. "filter": [
  245. { "term": { "ml.is_training": false } } <2>
  246. ]
  247. }
  248. },
  249. "evaluation": {
  250. "regression": {
  251. "actual_field": "price", <3>
  252. "predicted_field": "ml.price_prediction", <4>
  253. "metrics": {
  254. "r_squared": {},
  255. "mse": {},
  256. "msle": {"offset": 10},
  257. "huber": {"delta": 1.5}
  258. }
  259. }
  260. }
  261. }
  262. --------------------------------------------------
  263. // TEST[skip:TBD]
  264. <1> The output destination index from a {dfanalytics} {reganalysis}.
  265. <2> In this example, a test/train split (`training_percent`) was defined for the
  266. {reganalysis}. This query limits evaluation to be performed on the test split
  267. only.
  268. <3> The ground truth value for the actual house price. This is required in order
  269. to evaluate results.
  270. <4> The predicted value for house price calculated by the {reganalysis}.
  271. The following example calculates the training error:
  272. [source,console]
  273. --------------------------------------------------
  274. POST _ml/data_frame/_evaluate
  275. {
  276. "index": "student_performance_mathematics_reg",
  277. "query": {
  278. "term": {
  279. "ml.is_training": {
  280. "value": true <1>
  281. }
  282. }
  283. },
  284. "evaluation": {
  285. "regression": {
  286. "actual_field": "G3", <2>
  287. "predicted_field": "ml.G3_prediction", <3>
  288. "metrics": {
  289. "r_squared": {},
  290. "mse": {},
  291. "msle": {},
  292. "huber": {}
  293. }
  294. }
  295. }
  296. }
  297. --------------------------------------------------
  298. // TEST[skip:TBD]
  299. <1> In this example, a test/train split (`training_percent`) was defined for the
  300. {reganalysis}. This query limits evaluation to be performed on the train split
  301. only. It means that a training error will be calculated.
  302. <2> The field that contains the ground truth value for the actual student
  303. performance. This is required in order to evaluate results.
  304. <3> The field that contains the predicted value for student performance
  305. calculated by the {reganalysis}.
  306. The next example calculates the testing error. The only difference compared with
  307. the previous example is that `ml.is_training` is set to `false` this time, so
  308. the query excludes the train split from the evaluation.
  309. [source,console]
  310. --------------------------------------------------
  311. POST _ml/data_frame/_evaluate
  312. {
  313. "index": "student_performance_mathematics_reg",
  314. "query": {
  315. "term": {
  316. "ml.is_training": {
  317. "value": false <1>
  318. }
  319. }
  320. },
  321. "evaluation": {
  322. "regression": {
  323. "actual_field": "G3", <2>
  324. "predicted_field": "ml.G3_prediction", <3>
  325. "metrics": {
  326. "r_squared": {},
  327. "mse": {},
  328. "msle": {},
  329. "huber": {}
  330. }
  331. }
  332. }
  333. }
  334. --------------------------------------------------
  335. // TEST[skip:TBD]
  336. <1> In this example, a test/train split (`training_percent`) was defined for the
  337. {reganalysis}. This query limits evaluation to be performed on the test split
  338. only. It means that a testing error will be calculated.
  339. <2> The field that contains the ground truth value for the actual student
  340. performance. This is required in order to evaluate results.
  341. <3> The field that contains the predicted value for student performance
  342. calculated by the {reganalysis}.
  343. [[ml-evaluate-classification-example]]
  344. === {classification-cap}
  345. [source,console]
  346. --------------------------------------------------
  347. POST _ml/data_frame/_evaluate
  348. {
  349. "index": "animal_classification",
  350. "evaluation": {
  351. "classification": { <1>
  352. "actual_field": "animal_class", <2>
  353. "predicted_field": "ml.animal_class_prediction", <3>
  354. "metrics": {
  355. "multiclass_confusion_matrix" : {} <4>
  356. }
  357. }
  358. }
  359. }
  360. --------------------------------------------------
  361. // TEST[skip:TBD]
  362. <1> The evaluation type.
  363. <2> The field that contains the ground truth value for the actual animal
  364. classification. This is required in order to evaluate results.
  365. <3> The field that contains the predicted value for animal classification by
  366. the {classanalysis}.
  367. <4> Specifies the metric for the evaluation.
  368. The API returns the following result:
  369. [source,console-result]
  370. --------------------------------------------------
  371. {
  372. "classification" : {
  373. "multiclass_confusion_matrix" : {
  374. "confusion_matrix" : [
  375. {
  376. "actual_class" : "cat", <1>
  377. "actual_class_doc_count" : 12, <2>
  378. "predicted_classes" : [ <3>
  379. {
  380. "predicted_class" : "cat",
  381. "count" : 12 <4>
  382. },
  383. {
  384. "predicted_class" : "dog",
  385. "count" : 0 <5>
  386. }
  387. ],
  388. "other_predicted_class_doc_count" : 0 <6>
  389. },
  390. {
  391. "actual_class" : "dog",
  392. "actual_class_doc_count" : 11,
  393. "predicted_classes" : [
  394. {
  395. "predicted_class" : "dog",
  396. "count" : 7
  397. },
  398. {
  399. "predicted_class" : "cat",
  400. "count" : 4
  401. }
  402. ],
  403. "other_predicted_class_doc_count" : 0
  404. }
  405. ],
  406. "other_actual_class_count" : 0
  407. }
  408. }
  409. }
  410. --------------------------------------------------
  411. <1> The name of the actual class that the analysis tried to predict.
  412. <2> The number of documents in the index that belong to the `actual_class`.
  413. <3> This object contains the list of the predicted classes and the number of
  414. predictions associated with the class.
  415. <4> The number of cats in the dataset that are correctly identified as cats.
  416. <5> The number of cats in the dataset that are incorrectly classified as dogs.
  417. <6> The number of documents that are classified as a class that is not listed as
  418. a `predicted_class`.
  419. [source,console]
  420. --------------------------------------------------
  421. POST _ml/data_frame/_evaluate
  422. {
  423. "index": "animal_classification",
  424. "evaluation": {
  425. "classification": { <1>
  426. "actual_field": "animal_class", <2>
  427. "metrics": {
  428. "auc_roc" : { <3>
  429. "class_name": "dog" <4>
  430. }
  431. }
  432. }
  433. }
  434. }
  435. --------------------------------------------------
  436. // TEST[skip:TBD]
  437. <1> The evaluation type.
  438. <2> The field that contains the ground truth value for the actual animal
  439. classification. This is required in order to evaluate results.
  440. <3> Specifies the metric for the evaluation.
  441. <4> Specifies the class name that is treated as positive during the evaluation,
  442. all the other classes are treated as negative.
  443. The API returns the following result:
  444. [source,console-result]
  445. --------------------------------------------------
  446. {
  447. "classification" : {
  448. "auc_roc" : {
  449. "value" : 0.8941788639536681
  450. }
  451. }
  452. }
  453. --------------------------------------------------
  454. // TEST[skip:TBD]