evaluate-dfanalytics.asciidoc 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[evaluate-dfanalytics]]
  4. === Evaluate {dfanalytics} API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Evaluate {dfanalytics}</titleabbrev>
  8. ++++
  9. Evaluates the {dfanalytics} for an annotated index.
  10. experimental[]
  11. [[ml-evaluate-dfanalytics-request]]
  12. ==== {api-request-title}
  13. `POST _ml/data_frame/_evaluate`
  14. [[ml-evaluate-dfanalytics-prereq]]
  15. ==== {api-prereq-title}
  16. * You must have `monitor_ml` privilege to use this API. For more
  17. information, see <<security-privileges>> and <<built-in-roles>>.
  18. [[ml-evaluate-dfanalytics-desc]]
  19. ==== {api-description-title}
  20. The API packages together commonly used evaluation metrics for various types of
  21. machine learning features. This has been designed for use on indexes created by
  22. {dfanalytics}. Evaluation requires both a ground truth field and an analytics
  23. result field to be present.
  24. [[ml-evaluate-dfanalytics-request-body]]
  25. ==== {api-request-body-title}
  26. `evaluation`::
  27. (Required, object) Defines the type of evaluation you want to perform. The
  28. value of this object can be different depending on the type of evaluation you
  29. want to perform. See <<ml-evaluate-dfanalytics-resources>>.
  30. +
  31. --
  32. Available evaluation types:
  33. * `binary_soft_classification`
  34. * `regression`
  35. * `classification`
  36. --
  37. `index`::
  38. (Required, object) Defines the `index` in which the evaluation will be
  39. performed.
  40. `query`::
  41. (Optional, object) A query clause that retrieves a subset of data from the
  42. source index. See <<query-dsl>>.
  43. [[ml-evaluate-dfanalytics-resources]]
  44. ==== {dfanalytics-cap} evaluation resources
  45. [[binary-sc-resources]]
  46. ===== Binary soft classification configuration objects
  47. Binary soft classification evaluates the results of an analysis which outputs
  48. the probability that each document belongs to a certain class. For example, in
  49. the context of {oldetection}, the analysis outputs the probability whether each
  50. document is an outlier.
  51. `actual_field`::
  52. (Required, string) The field of the `index` which contains the `ground truth`.
  53. The data type of this field can be boolean or integer. If the data type is
  54. integer, the value has to be either `0` (false) or `1` (true).
  55. `predicted_probability_field`::
  56. (Required, string) The field of the `index` that defines the probability of
  57. whether the item belongs to the class in question or not. It's the field that
  58. contains the results of the analysis.
  59. `metrics`::
  60. (Optional, object) Specifies the metrics that are used for the evaluation.
  61. Available metrics:
  62. `auc_roc`::
  63. (Optional, object) The AUC ROC (area under the curve of the receiver
  64. operating characteristic) score and optionally the curve. Default value is
  65. {"includes_curve": false}.
  66. `precision`::
  67. (Optional, object) Set the different thresholds of the {olscore} at where
  68. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  69. `recall`::
  70. (Optional, object) Set the different thresholds of the {olscore} at where
  71. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  72. `confusion_matrix`::
  73. (Optional, object) Set the different thresholds of the {olscore} at where
  74. the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
  75. negative, `fn` - false negative) are calculated. Default value is
  76. {"at": [0.25, 0.50, 0.75]}.
  77. [[regression-evaluation-resources]]
  78. ===== {regression-cap} evaluation objects
  79. {regression-cap} evaluation evaluates the results of a {regression} analysis
  80. which outputs a prediction of values.
  81. `actual_field`::
  82. (Required, string) The field of the `index` which contains the `ground truth`.
  83. The data type of this field must be numerical.
  84. `predicted_field`::
  85. (Required, string) The field in the `index` that contains the predicted value,
  86. in other words the results of the {regression} analysis.
  87. `metrics`::
  88. (Required, object) Specifies the metrics that are used for the evaluation.
  89. Available metrics are `r_squared` and `mean_squared_error`.
  90. [[classification-evaluation-resources]]
  91. ==== {classification-cap} evaluation objects
  92. {classification-cap} evaluation evaluates the results of a {classanalysis} which
  93. outputs a prediction that identifies to which of the classes each document
  94. belongs.
  95. `actual_field`::
  96. (Required, string) The field of the `index` which contains the ground truth.
  97. The data type of this field must be keyword.
  98. `metrics`::
  99. (Required, object) Specifies the metrics that are used for the evaluation.
  100. Available metric is `multiclass_confusion_matrix`.
  101. `predicted_field`::
  102. (Required, string) The field in the `index` that contains the predicted value,
  103. in other words the results of the {classanalysis}. The data type of this field
  104. is string. You need to add `.keyword` to the predicted field name (the name
  105. you put in the {classanalysis} object as `prediction_field_name` or the
  106. default value of the same field if you didn't specified explicitly). For
  107. example, `predicted_field` : `ml.animal_class_prediction.keyword`.
  108. ////
  109. [[ml-evaluate-dfanalytics-results]]
  110. ==== {api-response-body-title}
  111. `binary_soft_classification`::
  112. (object) If you chose to do binary soft classification, the API returns the
  113. following evaluation metrics:
  114. `auc_roc`::: TBD
  115. `confusion_matrix`::: TBD
  116. `precision`::: TBD
  117. `recall`::: TBD
  118. ////
  119. [[ml-evaluate-dfanalytics-example]]
  120. ==== {api-examples-title}
  121. [[ml-evaluate-binary-soft-class-example]]
  122. ===== Binary soft classification
  123. [source,console]
  124. --------------------------------------------------
  125. POST _ml/data_frame/_evaluate
  126. {
  127. "index": "my_analytics_dest_index",
  128. "evaluation": {
  129. "binary_soft_classification": {
  130. "actual_field": "is_outlier",
  131. "predicted_probability_field": "ml.outlier_score"
  132. }
  133. }
  134. }
  135. --------------------------------------------------
  136. // TEST[skip:TBD]
  137. The API returns the following results:
  138. [source,console-result]
  139. ----
  140. {
  141. "binary_soft_classification": {
  142. "auc_roc": {
  143. "score": 0.92584757746414444
  144. },
  145. "confusion_matrix": {
  146. "0.25": {
  147. "tp": 5,
  148. "fp": 9,
  149. "tn": 204,
  150. "fn": 5
  151. },
  152. "0.5": {
  153. "tp": 1,
  154. "fp": 5,
  155. "tn": 208,
  156. "fn": 9
  157. },
  158. "0.75": {
  159. "tp": 0,
  160. "fp": 4,
  161. "tn": 209,
  162. "fn": 10
  163. }
  164. },
  165. "precision": {
  166. "0.25": 0.35714285714285715,
  167. "0.5": 0.16666666666666666,
  168. "0.75": 0
  169. },
  170. "recall": {
  171. "0.25": 0.5,
  172. "0.5": 0.1,
  173. "0.75": 0
  174. }
  175. }
  176. }
  177. ----
  178. [[ml-evaluate-regression-example]]
  179. ===== {regression-cap}
  180. [source,console]
  181. --------------------------------------------------
  182. POST _ml/data_frame/_evaluate
  183. {
  184. "index": "house_price_predictions", <1>
  185. "query": {
  186. "bool": {
  187. "filter": [
  188. { "term": { "ml.is_training": false } } <2>
  189. ]
  190. }
  191. },
  192. "evaluation": {
  193. "regression": {
  194. "actual_field": "price", <3>
  195. "predicted_field": "ml.price_prediction", <4>
  196. "metrics": {
  197. "r_squared": {},
  198. "mean_squared_error": {}
  199. }
  200. }
  201. }
  202. }
  203. --------------------------------------------------
  204. // TEST[skip:TBD]
  205. <1> The output destination index from a {dfanalytics} {reganalysis}.
  206. <2> In this example, a test/train split (`training_percent`) was defined for the
  207. {reganalysis}. This query limits evaluation to be performed on the test split
  208. only.
  209. <3> The ground truth value for the actual house price. This is required in order
  210. to evaluate results.
  211. <4> The predicted value for house price calculated by the {reganalysis}.
  212. The following example calculates the training error:
  213. [source,console]
  214. --------------------------------------------------
  215. POST _ml/data_frame/_evaluate
  216. {
  217. "index": "student_performance_mathematics_reg",
  218. "query": {
  219. "term": {
  220. "ml.is_training": {
  221. "value": true <1>
  222. }
  223. }
  224. },
  225. "evaluation": {
  226. "regression": {
  227. "actual_field": "G3", <2>
  228. "predicted_field": "ml.G3_prediction", <3>
  229. "metrics": {
  230. "r_squared": {},
  231. "mean_squared_error": {}
  232. }
  233. }
  234. }
  235. }
  236. --------------------------------------------------
  237. // TEST[skip:TBD]
  238. <1> In this example, a test/train split (`training_percent`) was defined for the
  239. {reganalysis}. This query limits evaluation to be performed on the train split
  240. only. It means that a training error will be calculated.
  241. <2> The field that contains the ground truth value for the actual student
  242. performance. This is required in order to evaluate results.
  243. <3> The field that contains the predicted value for student performance
  244. calculated by the {reganalysis}.
  245. The next example calculates the testing error. The only difference compared with
  246. the previous example is that `ml.is_training` is set to `false` this time, so
  247. the query excludes the train split from the evaluation.
  248. [source,console]
  249. --------------------------------------------------
  250. POST _ml/data_frame/_evaluate
  251. {
  252. "index": "student_performance_mathematics_reg",
  253. "query": {
  254. "term": {
  255. "ml.is_training": {
  256. "value": false <1>
  257. }
  258. }
  259. },
  260. "evaluation": {
  261. "regression": {
  262. "actual_field": "G3", <2>
  263. "predicted_field": "ml.G3_prediction", <3>
  264. "metrics": {
  265. "r_squared": {},
  266. "mean_squared_error": {}
  267. }
  268. }
  269. }
  270. }
  271. --------------------------------------------------
  272. // TEST[skip:TBD]
  273. <1> In this example, a test/train split (`training_percent`) was defined for the
  274. {reganalysis}. This query limits evaluation to be performed on the test split
  275. only. It means that a testing error will be calculated.
  276. <2> The field that contains the ground truth value for the actual student
  277. performance. This is required in order to evaluate results.
  278. <3> The field that contains the predicted value for student performance
  279. calculated by the {reganalysis}.
  280. [[ml-evaluate-classification-example]]
  281. ===== {classification-cap}
  282. [source,console]
  283. --------------------------------------------------
  284. POST _ml/data_frame/_evaluate
  285. {
  286. "index": "animal_classification",
  287. "evaluation": {
  288. "classification": { <1>
  289. "actual_field": "animal_class", <2>
  290. "predicted_field": "ml.animal_class_prediction.keyword", <3>
  291. "metrics": {
  292. "multiclass_confusion_matrix" : {} <4>
  293. }
  294. }
  295. }
  296. }
  297. --------------------------------------------------
  298. // TEST[skip:TBD]
  299. <1> The evaluation type.
  300. <2> The field that contains the ground truth value for the actual animal
  301. classification. This is required in order to evaluate results.
  302. <3> The field that contains the predicted value for animal classification by
  303. the {classanalysis}. Since the field storing predicted class is dynamically
  304. mapped as text and keyword, you need to add the `.keyword` suffix to the name.
  305. <4> Specifies the metric for the evaluation.
  306. The API returns the following result:
  307. [source,console-result]
  308. --------------------------------------------------
  309. {
  310. "classification" : {
  311. "multiclass_confusion_matrix" : {
  312. "confusion_matrix" : [
  313. {
  314. "actual_class" : "cat", <1>
  315. "actual_class_doc_count" : 12, <2>
  316. "predicted_classes" : [ <3>
  317. {
  318. "predicted_class" : "cat",
  319. "count" : 12 <4>
  320. },
  321. {
  322. "predicted_class" : "dog",
  323. "count" : 0 <5>
  324. }
  325. ],
  326. "other_predicted_class_doc_count" : 0 <6>
  327. },
  328. {
  329. "actual_class" : "dog",
  330. "actual_class_doc_count" : 11,
  331. "predicted_classes" : [
  332. {
  333. "predicted_class" : "dog",
  334. "count" : 7
  335. },
  336. {
  337. "predicted_class" : "cat",
  338. "count" : 4
  339. }
  340. ],
  341. "other_predicted_class_doc_count" : 0
  342. }
  343. ],
  344. "other_actual_class_count" : 0
  345. }
  346. }
  347. }
  348. --------------------------------------------------
  349. <1> The name of the actual class that the analysis tried to predict.
  350. <2> The number of documents in the index that belong to the `actual_class`.
  351. <3> This object contains the list of the predicted classes and the number of
  352. predictions associated with the class.
  353. <4> The number of cats in the dataset that are correctly identified as cats.
  354. <5> The number of cats in the dataset that are incorrectly classified as dogs.
  355. <6> The number of documents that are classified as a class that is not listed as
  356. a `predicted_class`.