evaluate-dfanalytics.asciidoc 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432
  1. [role="xpack"]
  2. [testenv="platinum"]
  3. [[evaluate-dfanalytics]]
  4. === Evaluate {dfanalytics} API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Evaluate {dfanalytics}</titleabbrev>
  8. ++++
  9. Evaluates the {dfanalytics} for an annotated index.
  10. experimental[]
  11. [[ml-evaluate-dfanalytics-request]]
  12. ==== {api-request-title}
  13. `POST _ml/data_frame/_evaluate`
  14. [[ml-evaluate-dfanalytics-prereq]]
  15. ==== {api-prereq-title}
  16. If the {es} {security-features} are enabled, you must have the following privileges:
  17. * cluster: `monitor_ml`
  18. For more information, see <<security-privileges>> and <<built-in-roles>>.
  19. [[ml-evaluate-dfanalytics-desc]]
  20. ==== {api-description-title}
  21. The API packages together commonly used evaluation metrics for various types of
  22. machine learning features. This has been designed for use on indexes created by
  23. {dfanalytics}. Evaluation requires both a ground truth field and an analytics
  24. result field to be present.
  25. [[ml-evaluate-dfanalytics-request-body]]
  26. ==== {api-request-body-title}
  27. `evaluation`::
  28. (Required, object) Defines the type of evaluation you want to perform. The
  29. value of this object can be different depending on the type of evaluation you
  30. want to perform. See <<ml-evaluate-dfanalytics-resources>>.
  31. +
  32. --
  33. Available evaluation types:
  34. * `binary_soft_classification`
  35. * `regression`
  36. * `classification`
  37. --
  38. `index`::
  39. (Required, object) Defines the `index` in which the evaluation will be
  40. performed.
  41. `query`::
  42. (Optional, object) A query clause that retrieves a subset of data from the
  43. source index. See <<query-dsl>>.
  44. [[ml-evaluate-dfanalytics-resources]]
  45. ==== {dfanalytics-cap} evaluation resources
  46. [[binary-sc-resources]]
  47. ===== Binary soft classification configuration objects
  48. Binary soft classification evaluates the results of an analysis which outputs
  49. the probability that each document belongs to a certain class. For example, in
  50. the context of {oldetection}, the analysis outputs the probability whether each
  51. document is an outlier.
  52. `actual_field`::
  53. (Required, string) The field of the `index` which contains the `ground truth`.
  54. The data type of this field can be boolean or integer. If the data type is
  55. integer, the value has to be either `0` (false) or `1` (true).
  56. `predicted_probability_field`::
  57. (Required, string) The field of the `index` that defines the probability of
  58. whether the item belongs to the class in question or not. It's the field that
  59. contains the results of the analysis.
  60. `metrics`::
  61. (Optional, object) Specifies the metrics that are used for the evaluation.
  62. Available metrics:
  63. `auc_roc`::
  64. (Optional, object) The AUC ROC (area under the curve of the receiver
  65. operating characteristic) score and optionally the curve. Default value is
  66. {"includes_curve": false}.
  67. `precision`::
  68. (Optional, object) Set the different thresholds of the {olscore} at where
  69. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  70. `recall`::
  71. (Optional, object) Set the different thresholds of the {olscore} at where
  72. the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  73. `confusion_matrix`::
  74. (Optional, object) Set the different thresholds of the {olscore} at where
  75. the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
  76. negative, `fn` - false negative) are calculated. Default value is
  77. {"at": [0.25, 0.50, 0.75]}.
  78. [[regression-evaluation-resources]]
  79. ===== {regression-cap} evaluation objects
  80. {regression-cap} evaluation evaluates the results of a {regression} analysis
  81. which outputs a prediction of values.
  82. `actual_field`::
  83. (Required, string) The field of the `index` which contains the `ground truth`.
  84. The data type of this field must be numerical.
  85. `predicted_field`::
  86. (Required, string) The field in the `index` that contains the predicted value,
  87. in other words the results of the {regression} analysis.
  88. `metrics`::
  89. (Required, object) Specifies the metrics that are used for the evaluation.
  90. Available metrics are `r_squared` and `mean_squared_error`.
  91. [[classification-evaluation-resources]]
  92. ==== {classification-cap} evaluation objects
  93. {classification-cap} evaluation evaluates the results of a {classanalysis} which
  94. outputs a prediction that identifies to which of the classes each document
  95. belongs.
  96. `actual_field`::
  97. (Required, string) The field of the `index` which contains the ground truth.
  98. The data type of this field must be keyword.
  99. `metrics`::
  100. (Required, object) Specifies the metrics that are used for the evaluation.
  101. Available metric is `multiclass_confusion_matrix`.
  102. `predicted_field`::
  103. (Required, string) The field in the `index` that contains the predicted value,
  104. in other words the results of the {classanalysis}. The data type of this field
  105. is string. You need to add `.keyword` to the predicted field name (the name
  106. you put in the {classanalysis} object as `prediction_field_name` or the
  107. default value of the same field if you didn't specified explicitly). For
  108. example, `predicted_field` : `ml.animal_class_prediction.keyword`.
  109. ////
  110. [[ml-evaluate-dfanalytics-results]]
  111. ==== {api-response-body-title}
  112. `binary_soft_classification`::
  113. (object) If you chose to do binary soft classification, the API returns the
  114. following evaluation metrics:
  115. `auc_roc`::: TBD
  116. `confusion_matrix`::: TBD
  117. `precision`::: TBD
  118. `recall`::: TBD
  119. ////
  120. [[ml-evaluate-dfanalytics-example]]
  121. ==== {api-examples-title}
  122. [[ml-evaluate-binary-soft-class-example]]
  123. ===== Binary soft classification
  124. [source,console]
  125. --------------------------------------------------
  126. POST _ml/data_frame/_evaluate
  127. {
  128. "index": "my_analytics_dest_index",
  129. "evaluation": {
  130. "binary_soft_classification": {
  131. "actual_field": "is_outlier",
  132. "predicted_probability_field": "ml.outlier_score"
  133. }
  134. }
  135. }
  136. --------------------------------------------------
  137. // TEST[skip:TBD]
  138. The API returns the following results:
  139. [source,console-result]
  140. ----
  141. {
  142. "binary_soft_classification": {
  143. "auc_roc": {
  144. "score": 0.92584757746414444
  145. },
  146. "confusion_matrix": {
  147. "0.25": {
  148. "tp": 5,
  149. "fp": 9,
  150. "tn": 204,
  151. "fn": 5
  152. },
  153. "0.5": {
  154. "tp": 1,
  155. "fp": 5,
  156. "tn": 208,
  157. "fn": 9
  158. },
  159. "0.75": {
  160. "tp": 0,
  161. "fp": 4,
  162. "tn": 209,
  163. "fn": 10
  164. }
  165. },
  166. "precision": {
  167. "0.25": 0.35714285714285715,
  168. "0.5": 0.16666666666666666,
  169. "0.75": 0
  170. },
  171. "recall": {
  172. "0.25": 0.5,
  173. "0.5": 0.1,
  174. "0.75": 0
  175. }
  176. }
  177. }
  178. ----
  179. [[ml-evaluate-regression-example]]
  180. ===== {regression-cap}
  181. [source,console]
  182. --------------------------------------------------
  183. POST _ml/data_frame/_evaluate
  184. {
  185. "index": "house_price_predictions", <1>
  186. "query": {
  187. "bool": {
  188. "filter": [
  189. { "term": { "ml.is_training": false } } <2>
  190. ]
  191. }
  192. },
  193. "evaluation": {
  194. "regression": {
  195. "actual_field": "price", <3>
  196. "predicted_field": "ml.price_prediction", <4>
  197. "metrics": {
  198. "r_squared": {},
  199. "mean_squared_error": {}
  200. }
  201. }
  202. }
  203. }
  204. --------------------------------------------------
  205. // TEST[skip:TBD]
  206. <1> The output destination index from a {dfanalytics} {reganalysis}.
  207. <2> In this example, a test/train split (`training_percent`) was defined for the
  208. {reganalysis}. This query limits evaluation to be performed on the test split
  209. only.
  210. <3> The ground truth value for the actual house price. This is required in order
  211. to evaluate results.
  212. <4> The predicted value for house price calculated by the {reganalysis}.
  213. The following example calculates the training error:
  214. [source,console]
  215. --------------------------------------------------
  216. POST _ml/data_frame/_evaluate
  217. {
  218. "index": "student_performance_mathematics_reg",
  219. "query": {
  220. "term": {
  221. "ml.is_training": {
  222. "value": true <1>
  223. }
  224. }
  225. },
  226. "evaluation": {
  227. "regression": {
  228. "actual_field": "G3", <2>
  229. "predicted_field": "ml.G3_prediction", <3>
  230. "metrics": {
  231. "r_squared": {},
  232. "mean_squared_error": {}
  233. }
  234. }
  235. }
  236. }
  237. --------------------------------------------------
  238. // TEST[skip:TBD]
  239. <1> In this example, a test/train split (`training_percent`) was defined for the
  240. {reganalysis}. This query limits evaluation to be performed on the train split
  241. only. It means that a training error will be calculated.
  242. <2> The field that contains the ground truth value for the actual student
  243. performance. This is required in order to evaluate results.
  244. <3> The field that contains the predicted value for student performance
  245. calculated by the {reganalysis}.
  246. The next example calculates the testing error. The only difference compared with
  247. the previous example is that `ml.is_training` is set to `false` this time, so
  248. the query excludes the train split from the evaluation.
  249. [source,console]
  250. --------------------------------------------------
  251. POST _ml/data_frame/_evaluate
  252. {
  253. "index": "student_performance_mathematics_reg",
  254. "query": {
  255. "term": {
  256. "ml.is_training": {
  257. "value": false <1>
  258. }
  259. }
  260. },
  261. "evaluation": {
  262. "regression": {
  263. "actual_field": "G3", <2>
  264. "predicted_field": "ml.G3_prediction", <3>
  265. "metrics": {
  266. "r_squared": {},
  267. "mean_squared_error": {}
  268. }
  269. }
  270. }
  271. }
  272. --------------------------------------------------
  273. // TEST[skip:TBD]
  274. <1> In this example, a test/train split (`training_percent`) was defined for the
  275. {reganalysis}. This query limits evaluation to be performed on the test split
  276. only. It means that a testing error will be calculated.
  277. <2> The field that contains the ground truth value for the actual student
  278. performance. This is required in order to evaluate results.
  279. <3> The field that contains the predicted value for student performance
  280. calculated by the {reganalysis}.
  281. [[ml-evaluate-classification-example]]
  282. ===== {classification-cap}
  283. [source,console]
  284. --------------------------------------------------
  285. POST _ml/data_frame/_evaluate
  286. {
  287. "index": "animal_classification",
  288. "evaluation": {
  289. "classification": { <1>
  290. "actual_field": "animal_class", <2>
  291. "predicted_field": "ml.animal_class_prediction.keyword", <3>
  292. "metrics": {
  293. "multiclass_confusion_matrix" : {} <4>
  294. }
  295. }
  296. }
  297. }
  298. --------------------------------------------------
  299. // TEST[skip:TBD]
  300. <1> The evaluation type.
  301. <2> The field that contains the ground truth value for the actual animal
  302. classification. This is required in order to evaluate results.
  303. <3> The field that contains the predicted value for animal classification by
  304. the {classanalysis}. Since the field storing predicted class is dynamically
  305. mapped as text and keyword, you need to add the `.keyword` suffix to the name.
  306. <4> Specifies the metric for the evaluation.
  307. The API returns the following result:
  308. [source,console-result]
  309. --------------------------------------------------
  310. {
  311. "classification" : {
  312. "multiclass_confusion_matrix" : {
  313. "confusion_matrix" : [
  314. {
  315. "actual_class" : "cat", <1>
  316. "actual_class_doc_count" : 12, <2>
  317. "predicted_classes" : [ <3>
  318. {
  319. "predicted_class" : "cat",
  320. "count" : 12 <4>
  321. },
  322. {
  323. "predicted_class" : "dog",
  324. "count" : 0 <5>
  325. }
  326. ],
  327. "other_predicted_class_doc_count" : 0 <6>
  328. },
  329. {
  330. "actual_class" : "dog",
  331. "actual_class_doc_count" : 11,
  332. "predicted_classes" : [
  333. {
  334. "predicted_class" : "dog",
  335. "count" : 7
  336. },
  337. {
  338. "predicted_class" : "cat",
  339. "count" : 4
  340. }
  341. ],
  342. "other_predicted_class_doc_count" : 0
  343. }
  344. ],
  345. "other_actual_class_count" : 0
  346. }
  347. }
  348. }
  349. --------------------------------------------------
  350. <1> The name of the actual class that the analysis tried to predict.
  351. <2> The number of documents in the index that belong to the `actual_class`.
  352. <3> This object contains the list of the predicted classes and the number of
  353. predictions associated with the class.
  354. <4> The number of cats in the dataset that are correctly identified as cats.
  355. <5> The number of cats in the dataset that are incorrectly classified as dogs.
  356. <6> The number of documents that are classified as a class that is not listed as
  357. a `predicted_class`.