Browse Source

[DOCS] Adds classification type evaluation docs to the DFA evaluation API (#47657)

István Zoltán Szabó 6 years ago
parent
commit
fe92cd0a26

+ 88 - 1
docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

@@ -17,12 +17,14 @@ experimental[]
 
 `POST _ml/data_frame/_evaluate`
 
+
 [[ml-evaluate-dfanalytics-prereq]]
 ==== {api-prereq-title}
 
 * You must have `monitor_ml` privilege to use this API. For more 
 information, see <<security-privileges>> and <<built-in-roles>>.
 
+
 [[ml-evaluate-dfanalytics-desc]]
 ==== {api-description-title}
 
@@ -52,6 +54,7 @@ Available evaluation types:
 
 * `binary_soft_classification`
 * `regression`
+* `classification`
 --
 
 
@@ -246,4 +249,88 @@ only. It means that a testing error will be calculated.
 <2> The field that contains the ground truth value for the actual student 
 performance. This is required in order to evaluate results.
 <3> The field that contains the predicted value for student performance 
-calculated by the {reganalysis}.
+calculated by the {reganalysis}.
+
+
+===== {classification-cap}
+
+
+[source,console]
+--------------------------------------------------
+POST _ml/data_frame/_evaluate
+{ 
+   "index": "animal_classification",
+   "evaluation": {
+      "classification": { <1>
+         "actual_field": "animal_class", <2>
+         "predicted_field": "ml.animal_class_prediction.keyword", <3>
+         "metrics": {  
+           "multiclass_confusion_matrix" : {} <4>
+         }
+      }
+   }
+}
+--------------------------------------------------
+// TEST[skip:TBD]
+
+<1> The evaluation type.
+<2> The field that contains the ground truth value for the actual animal 
+classification. This is required in order to evaluate results.
+<3> The field that contains the predicted value for animal classification by 
+the {classanalysis}. Since the field storing predicted class is dynamically 
+mapped as text and keyword, you need to add the `.keyword` suffix to the name.
+<4> Specifies the metric for the evaluation.
+
+
+The API returns the following result:
+
+[source,console-result]
+--------------------------------------------------
+{
+   "classification" : {
+      "multiclass_confusion_matrix" : {
+         "confusion_matrix" : [
+         {
+            "actual_class" : "cat", <1>
+            "actual_class_doc_count" : 12, <2>
+            "predicted_classes" : [ <3>
+              {
+                "predicted_class" : "cat",
+                "count" : 12 <4>
+              },
+              {
+                "predicted_class" : "dog",
+                "count" : 0 <5>
+              }
+            ],
+            "other_predicted_class_doc_count" : 0 <6>
+          },
+          {
+            "actual_class" : "dog",
+            "actual_class_doc_count" : 11,
+            "predicted_classes" : [
+              {
+                "predicted_class" : "dog",
+                "count" : 11
+              },
+              {
+                "predicted_class" : "cat",
+                "count" : 4
+              }
+            ],
+            "other_predicted_class_doc_count" : 4
+          }
+        ],
+        "other_actual_class_count" : 0
+      }
+    }
+  }
+--------------------------------------------------
+<1> The name of the actual class that the analysis tried to predict.
+<2> The number of documents in the index that belong to the `actual_class`.
+<3> This object contains the list of the predicted classes and the number of 
+predictions associated with the class.
+<4> The number of cats in the dataset that are correctly identified as cats.
+<5> The number of cats in the dataset that are incorrectly classified as dogs.
+<6> The number of documents that are classified as a class that is not listed as 
+a `predicted_class`.

+ 31 - 1
docs/reference/ml/df-analytics/apis/evaluateresources.asciidoc

@@ -18,6 +18,7 @@ Evaluation configuration objects relate to the <<evaluate-dfanalytics>>.
 Available evaluation types:
 * `binary_soft_classification`
 * `regression`
+* `classification`
 --
   
 `query`::
@@ -95,4 +96,33 @@ which outputs a prediction of values.
   
 `metrics`::
   (object) Specifies the metrics that are used for the evaluation. Available 
-  metrics are `r_squared` and `mean_squared_error`.
+  metrics are `r_squared` and `mean_squared_error`.
+  
+  
+[[classification-evaluation-resources]]
+==== {classification-cap} evaluation objects
+
+{classification-cap} evaluation evaluates the results of a {classanalysis} which 
+outputs a prediction that identifies to which of the classes each document 
+belongs.
+
+
+[discrete]
+[[classification-evaluation-resources-properties]]
+===== {api-definitions-title}
+
+`actual_field`::
+  (string) The field of the `index` which contains the ground truth. The data 
+  type of this field must be keyword.
+  
+`metrics`::
+  (object) Specifies the metrics that are used for the evaluation. Available 
+  metric is `multiclass_confusion_matrix`.
+  
+`predicted_field`::
+  (string) The field in the `index` that contains the predicted value, in other 
+  words the results of the {classanalysis}. The data type of this field is 
+  string. You need to add `.keyword` to the predicted field name (the name you 
+  put in the {classanalysis} object as `prediction_field_name` or the default 
+  value of the same field if you didn't specified explicitly). For example, 
+  `predicted_field` : `ml.animal_class_prediction.keyword`.