evaluate-dfanalytics.asciidoc 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[evaluate-dfanalytics]]
  4. === Evaluate {dfanalytics} API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Evaluate {dfanalytics}</titleabbrev>
  8. ++++
  9. Evaluates the {dfanalytics} for an annotated index.
  10. experimental[]
  11. [[ml-evaluate-dfanalytics-request]]
  12. ==== {api-request-title}
  13. `POST _ml/data_frame/_evaluate`
  14. [[ml-evaluate-dfanalytics-prereq]]
  15. ==== {api-prereq-title}
  16. If the {es} {security-features} are enabled, you must have the following
  17. privileges:
  18. * cluster: `monitor_ml`
  19. For more information, see <<security-privileges>> and <<built-in-roles>>.
  20. [[ml-evaluate-dfanalytics-desc]]
  21. ==== {api-description-title}
  22. The API packages together commonly used evaluation metrics for various types of
  23. machine learning features. This has been designed for use on indexes created by
  24. {dfanalytics}. Evaluation requires both a ground truth field and an analytics
  25. result field to be present.
  26. [[ml-evaluate-dfanalytics-request-body]]
  27. ==== {api-request-body-title}
  28. `evaluation`::
  29. (Required, object) Defines the type of evaluation you want to perform. The
  30. value of this object can be different depending on the type of evaluation you
  31. want to perform. See <<ml-evaluate-dfanalytics-resources>>.
  32. +
  33. --
  34. Available evaluation types:
  35. * `binary_soft_classification`
  36. * `regression`
  37. * `classification`
  38. --
  39. `index`::
  40. (Required, object) Defines the `index` in which the evaluation will be
  41. performed.
  42. `query`::
  43. (Optional, object) A query clause that retrieves a subset of data from the
  44. source index. See <<query-dsl>>.
  45. [[ml-evaluate-dfanalytics-resources]]
  46. ==== {dfanalytics-cap} evaluation resources
  47. [[binary-sc-resources]]
  48. ===== Binary soft classification configuration objects
  49. Binary soft classification evaluates the results of an analysis which outputs
  50. the probability that each document belongs to a certain class. For example, in
  51. the context of {oldetection}, the analysis outputs the probability whether each
  52. document is an outlier.
  53. `actual_field`::
  54. (Required, string) The field of the `index` which contains the `ground truth`.
  55. The data type of this field can be boolean or integer. If the data type is
  56. integer, the value has to be either `0` (false) or `1` (true).
  57. `predicted_probability_field`::
  58. (Required, string) The field of the `index` that defines the probability of
  59. whether the item belongs to the class in question or not. It's the field that
  60. contains the results of the analysis.
  61. `metrics`::
  62. (Optional, object) Specifies the metrics that are used for the evaluation.
  63. Available metrics:
  64. `auc_roc`::
  65. (Optional, object) The AUC ROC (area under the curve of the receiver
  66. operating characteristic) score and optionally the curve. Default value is
  67. {"includes_curve": false}.
  68. `precision`::
  69. (Optional, object) Set the different thresholds of the {olscore} at where
  70. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  71. `recall`::
  72. (Optional, object) Set the different thresholds of the {olscore} at where
  73. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  74. `confusion_matrix`::
  75. (Optional, object) Set the different thresholds of the {olscore} at where
  76. the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
  77. negative, `fn` - false negative) are calculated. Default value is
  78. {"at": [0.25, 0.50, 0.75]}.
  79. [[regression-evaluation-resources]]
  80. ===== {regression-cap} evaluation objects
  81. {regression-cap} evaluation evaluates the results of a {regression} analysis
  82. which outputs a prediction of values.
  83. `actual_field`::
  84. (Required, string) The field of the `index` which contains the `ground truth`.
  85. The data type of this field must be numerical.
  86. `predicted_field`::
  87. (Required, string) The field in the `index` that contains the predicted value,
  88. in other words the results of the {regression} analysis.
  89. `metrics`::
  90. (Required, object) Specifies the metrics that are used for the evaluation.
  91. Available metrics are `r_squared` and `mean_squared_error`.
  92. [[classification-evaluation-resources]]
  93. ==== {classification-cap} evaluation objects
  94. {classification-cap} evaluation evaluates the results of a {classanalysis} which
  95. outputs a prediction that identifies to which of the classes each document
  96. belongs.
  97. `actual_field`::
  98. (Required, string) The field of the `index` which contains the ground truth.
  99. The data type of this field must be keyword.
  100. `metrics`::
  101. (Required, object) Specifies the metrics that are used for the evaluation.
  102. Available metric is `multiclass_confusion_matrix`.
  103. `predicted_field`::
  104. (Required, string) The field in the `index` that contains the predicted value,
  105. in other words the results of the {classanalysis}. The data type of this field
  106. is string. You need to add `.keyword` to the predicted field name (the name
  107. you put in the {classanalysis} object as `prediction_field_name` or the
  108. default value of the same field if you didn't specified explicitly). For
  109. example, `predicted_field` : `ml.animal_class_prediction.keyword`.
  110. ////
  111. [[ml-evaluate-dfanalytics-results]]
  112. ==== {api-response-body-title}
  113. `binary_soft_classification`::
  114. (object) If you chose to do binary soft classification, the API returns the
  115. following evaluation metrics:
  116. `auc_roc`::: TBD
  117. `confusion_matrix`::: TBD
  118. `precision`::: TBD
  119. `recall`::: TBD
  120. ////
  121. [[ml-evaluate-dfanalytics-example]]
  122. ==== {api-examples-title}
  123. [[ml-evaluate-binary-soft-class-example]]
  124. ===== Binary soft classification
  125. [source,console]
  126. --------------------------------------------------
  127. POST _ml/data_frame/_evaluate
  128. {
  129. "index": "my_analytics_dest_index",
  130. "evaluation": {
  131. "binary_soft_classification": {
  132. "actual_field": "is_outlier",
  133. "predicted_probability_field": "ml.outlier_score"
  134. }
  135. }
  136. }
  137. --------------------------------------------------
  138. // TEST[skip:TBD]
  139. The API returns the following results:
  140. [source,console-result]
  141. ----
  142. {
  143. "binary_soft_classification": {
  144. "auc_roc": {
  145. "score": 0.92584757746414444
  146. },
  147. "confusion_matrix": {
  148. "0.25": {
  149. "tp": 5,
  150. "fp": 9,
  151. "tn": 204,
  152. "fn": 5
  153. },
  154. "0.5": {
  155. "tp": 1,
  156. "fp": 5,
  157. "tn": 208,
  158. "fn": 9
  159. },
  160. "0.75": {
  161. "tp": 0,
  162. "fp": 4,
  163. "tn": 209,
  164. "fn": 10
  165. }
  166. },
  167. "precision": {
  168. "0.25": 0.35714285714285715,
  169. "0.5": 0.16666666666666666,
  170. "0.75": 0
  171. },
  172. "recall": {
  173. "0.25": 0.5,
  174. "0.5": 0.1,
  175. "0.75": 0
  176. }
  177. }
  178. }
  179. ----
  180. [[ml-evaluate-regression-example]]
  181. ===== {regression-cap}
  182. [source,console]
  183. --------------------------------------------------
  184. POST _ml/data_frame/_evaluate
  185. {
  186. "index": "house_price_predictions", <1>
  187. "query": {
  188. "bool": {
  189. "filter": [
  190. { "term": { "ml.is_training": false } } <2>
  191. ]
  192. }
  193. },
  194. "evaluation": {
  195. "regression": {
  196. "actual_field": "price", <3>
  197. "predicted_field": "ml.price_prediction", <4>
  198. "metrics": {
  199. "r_squared": {},
  200. "mean_squared_error": {}
  201. }
  202. }
  203. }
  204. }
  205. --------------------------------------------------
  206. // TEST[skip:TBD]
  207. <1> The output destination index from a {dfanalytics} {reganalysis}.
  208. <2> In this example, a test/train split (`training_percent`) was defined for the
  209. {reganalysis}. This query limits evaluation to be performed on the test split
  210. only.
  211. <3> The ground truth value for the actual house price. This is required in order
  212. to evaluate results.
  213. <4> The predicted value for house price calculated by the {reganalysis}.
  214. The following example calculates the training error:
  215. [source,console]
  216. --------------------------------------------------
  217. POST _ml/data_frame/_evaluate
  218. {
  219. "index": "student_performance_mathematics_reg",
  220. "query": {
  221. "term": {
  222. "ml.is_training": {
  223. "value": true <1>
  224. }
  225. }
  226. },
  227. "evaluation": {
  228. "regression": {
  229. "actual_field": "G3", <2>
  230. "predicted_field": "ml.G3_prediction", <3>
  231. "metrics": {
  232. "r_squared": {},
  233. "mean_squared_error": {}
  234. }
  235. }
  236. }
  237. }
  238. --------------------------------------------------
  239. // TEST[skip:TBD]
  240. <1> In this example, a test/train split (`training_percent`) was defined for the
  241. {reganalysis}. This query limits evaluation to be performed on the train split
  242. only. It means that a training error will be calculated.
  243. <2> The field that contains the ground truth value for the actual student
  244. performance. This is required in order to evaluate results.
  245. <3> The field that contains the predicted value for student performance
  246. calculated by the {reganalysis}.
  247. The next example calculates the testing error. The only difference compared with
  248. the previous example is that `ml.is_training` is set to `false` this time, so
  249. the query excludes the train split from the evaluation.
  250. [source,console]
  251. --------------------------------------------------
  252. POST _ml/data_frame/_evaluate
  253. {
  254. "index": "student_performance_mathematics_reg",
  255. "query": {
  256. "term": {
  257. "ml.is_training": {
  258. "value": false <1>
  259. }
  260. }
  261. },
  262. "evaluation": {
  263. "regression": {
  264. "actual_field": "G3", <2>
  265. "predicted_field": "ml.G3_prediction", <3>
  266. "metrics": {
  267. "r_squared": {},
  268. "mean_squared_error": {}
  269. }
  270. }
  271. }
  272. }
  273. --------------------------------------------------
  274. // TEST[skip:TBD]
  275. <1> In this example, a test/train split (`training_percent`) was defined for the
  276. {reganalysis}. This query limits evaluation to be performed on the test split
  277. only. It means that a testing error will be calculated.
  278. <2> The field that contains the ground truth value for the actual student
  279. performance. This is required in order to evaluate results.
  280. <3> The field that contains the predicted value for student performance
  281. calculated by the {reganalysis}.
  282. [[ml-evaluate-classification-example]]
  283. ===== {classification-cap}
  284. [source,console]
  285. --------------------------------------------------
  286. POST _ml/data_frame/_evaluate
  287. {
  288. "index": "animal_classification",
  289. "evaluation": {
  290. "classification": { <1>
  291. "actual_field": "animal_class", <2>
  292. "predicted_field": "ml.animal_class_prediction.keyword", <3>
  293. "metrics": {
  294. "multiclass_confusion_matrix" : {} <4>
  295. }
  296. }
  297. }
  298. }
  299. --------------------------------------------------
  300. // TEST[skip:TBD]
  301. <1> The evaluation type.
  302. <2> The field that contains the ground truth value for the actual animal
  303. classification. This is required in order to evaluate results.
  304. <3> The field that contains the predicted value for animal classification by
  305. the {classanalysis}. Since the field storing predicted class is dynamically
  306. mapped as text and keyword, you need to add the `.keyword` suffix to the name.
  307. <4> Specifies the metric for the evaluation.
  308. The API returns the following result:
  309. [source,console-result]
  310. --------------------------------------------------
  311. {
  312. "classification" : {
  313. "multiclass_confusion_matrix" : {
  314. "confusion_matrix" : [
  315. {
  316. "actual_class" : "cat", <1>
  317. "actual_class_doc_count" : 12, <2>
  318. "predicted_classes" : [ <3>
  319. {
  320. "predicted_class" : "cat",
  321. "count" : 12 <4>
  322. },
  323. {
  324. "predicted_class" : "dog",
  325. "count" : 0 <5>
  326. }
  327. ],
  328. "other_predicted_class_doc_count" : 0 <6>
  329. },
  330. {
  331. "actual_class" : "dog",
  332. "actual_class_doc_count" : 11,
  333. "predicted_classes" : [
  334. {
  335. "predicted_class" : "dog",
  336. "count" : 7
  337. },
  338. {
  339. "predicted_class" : "cat",
  340. "count" : 4
  341. }
  342. ],
  343. "other_predicted_class_doc_count" : 0
  344. }
  345. ],
  346. "other_actual_class_count" : 0
  347. }
  348. }
  349. }
  350. --------------------------------------------------
  351. <1> The name of the actual class that the analysis tried to predict.
  352. <2> The number of documents in the index that belong to the `actual_class`.
  353. <3> This object contains the list of the predicted classes and the number of
  354. predictions associated with the class.
  355. <4> The number of cats in the dataset that are correctly identified as cats.
  356. <5> The number of cats in the dataset that are incorrectly classified as dogs.
  357. <6> The number of documents that are classified as a class that is not listed as
  358. a `predicted_class`.