lqb
/
elasticsearch
mirror of https://gitee.com/mirrors/elasticsearch.git


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495
							[role="xpack"]
[testenv="platinum"]
[[evaluate-dfanalytics]]
= Evaluate {dfanalytics} API

[subs="attributes"]
++++
<titleabbrev>Evaluate {dfanalytics}</titleabbrev>
++++

Evaluates the {dfanalytics} for an annotated index.

experimental[]


[[ml-evaluate-dfanalytics-request]]
== {api-request-title}

`POST _ml/data_frame/_evaluate`


[[ml-evaluate-dfanalytics-prereq]]
== {api-prereq-title}

If the {es} {security-features} are enabled, you must have the following 
privileges:

* cluster: `monitor_ml`
  
For more information, see <<security-privileges>> and 
{ml-docs-setup-privileges}.


[[ml-evaluate-dfanalytics-desc]]
== {api-description-title}

The API packages together commonly used evaluation metrics for various types of 
machine learning features. This has been designed for use on indexes created by 
{dfanalytics}. Evaluation requires both a ground truth field and an analytics 
result field to be present.


[[ml-evaluate-dfanalytics-request-body]]
== {api-request-body-title}

`evaluation`::
(Required, object) Defines the type of evaluation you want to perform.
See <<ml-evaluate-dfanalytics-resources>>.
+
--
Available evaluation types:

* `outlier_detection`
* `regression`
* `classification`

--

`index`::
(Required, object) Defines the `index` in which the evaluation will be
performed.

`query`::
(Optional, object) A query clause that retrieves a subset of data from the
source index. See <<query-dsl>>.

[[ml-evaluate-dfanalytics-resources]]
== {dfanalytics-cap} evaluation resources

[[oldetection-resources]]
=== {oldetection-cap} evaluation objects

{oldetection-cap} evaluates the results of an {oldetection} analysis which 
outputs the probability that each document is an outlier.

`actual_field`::
  (Required, string) The field of the `index` which contains the `ground truth`. 
  The data type of this field can be boolean or integer. If the data type is 
  integer, the value has to be either `0` (false) or `1` (true).

`predicted_probability_field`::
  (Required, string) The field of the `index` that defines the probability of 
  whether the item belongs to the class in question or not. It's the field that 
  contains the results of the analysis.

`metrics`::
  (Optional, object) Specifies the metrics that are used for the evaluation. 
  Available metrics:
  
  `auc_roc`:::
    (Optional, object) The AUC ROC (area under the curve of the receiver 
    operating characteristic) score and optionally the curve. Default value is 
    {"include_curve": false}.
    
  `confusion_matrix`:::
    (Optional, object) Set the different thresholds of the {olscore} at where
    the metrics (`tp` - true positive, `fp` - false positive, `tn` - true
    negative, `fn` - false negative) are calculated. Default value is
    {"at": [0.25, 0.50, 0.75]}.

  `precision`:::
    (Optional, object) Set the different thresholds of the {olscore} at where 
    the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.
  
  `recall`:::
    (Optional, object) Set the different thresholds of the {olscore} at where 
    the metric is calculated. Default value is {"at": [0.25, 0.50, 0.75]}.

    
[[regression-evaluation-resources]]
=== {regression-cap} evaluation objects

{regression-cap} evaluation evaluates the results of a {regression} analysis 
which outputs a prediction of values.

`actual_field`::
  (Required, string) The field of the `index` which contains the `ground truth`. 
  The data type of this field must be numerical.
  
`predicted_field`::
  (Required, string) The field in the `index` that contains the predicted value, 
  in other words the results of the {regression} analysis.
  
`metrics`::
  (Optional, object) Specifies the metrics that are used for the evaluation. For 
  more information on `mse`, `msle`, and `huber`, consult 
  https://github.com/elastic/examples/tree/master/Machine%20Learning/Regression%20Loss%20Functions[the Jupyter notebook on regression loss functions].
  Available metrics:

  `mse`:::
    (Optional, object) Average squared difference between the predicted values 
    and the actual (`ground truth`) value. For more information, read 
    {wikipedia}/Mean_squared_error[this wiki article].

  `msle`:::
    (Optional, object) Average squared difference between the logarithm of the 
    predicted values and the logarithm of the actual (`ground truth`) value.
    
    `offset`::::
      (Optional, double) Defines the transition point at which you switch from 
      minimizing quadratic error to minimizing quadratic log error. Defaults to 
      `1`.

  `huber`:::
    (Optional, object) Pseudo Huber loss function. For more information, read 
    {wikipedia}/Huber_loss#Pseudo-Huber_loss_function[this wiki article].
    
    `delta`::::
      (Optional, double) Approximates 1/2 (prediction - actual)^2^ for values 
      much less than delta and approximates a straight line with slope delta for 
      values much larger than delta. Defaults to `1`. Delta needs to be greater 
      than `0`.

  `r_squared`:::
    (Optional, object) Proportion of the variance in the dependent variable that 
    is predictable from the independent variables. For more information, read 
    {wikipedia}/Coefficient_of_determination[this wiki article].


[[classification-evaluation-resources]]
== {classification-cap} evaluation objects

{classification-cap} evaluation evaluates the results of a {classanalysis} which 
outputs a prediction that identifies to which of the classes each document 
belongs.

`actual_field`::
  (Required, string) The field of the `index` which contains the `ground truth`.
  The data type of this field must be categorical.
  
`predicted_field`::
  (Optional, string) The field in the `index` which contains the predicted value,
  in other words the results of the {classanalysis}.

`top_classes_field`::
  (Optional, string) The field of the `index` which is an array of documents
  of the form `{ "class_name": XXX, "class_probability": YYY }`.
  This field must be defined as `nested` in the mappings.

`metrics`::
  (Optional, object) Specifies the metrics that are used for the evaluation.
  Available metrics:

  `accuracy`:::
    (Optional, object) Accuracy of predictions (per-class and overall).

  `auc_roc`:::
    (Optional, object) The AUC ROC (area under the curve of the receiver
    operating characteristic) score and optionally the curve.
    It is calculated for a specific class (provided as "class_name") treated as 
    positive.

    `class_name`::::
      (Required, string) Name of the only class that will be treated as
      positive during AUC ROC calculation. Other classes will be treated as
      negative ("one-vs-all" strategy). Documents which do not have `class_name`
      in the list of their top classes will not be taken into account for 
      evaluation. The number of documents taken into account is returned in the 
      evaluation result (`auc_roc.doc_count` field).

    `include_curve`::::
      (Optional, boolean) Whether or not the curve should be returned in
      addition to the score. Default value is false.

  `multiclass_confusion_matrix`:::
    (Optional, object) Multiclass confusion matrix.

  `precision`:::
    (Optional, object) Precision of predictions (per-class and average).

  `recall`:::
    (Optional, object) Recall of predictions (per-class and average).


////
[[ml-evaluate-dfanalytics-results]]
== {api-response-body-title}

`outlier_detection`::
  (object) If you chose to do outlier detection, the API returns the
  following evaluation metrics:
  
`auc_roc`::: TBD

`confusion_matrix`::: TBD
  
`precision`::: TBD

`recall`::: TBD
////


[[ml-evaluate-dfanalytics-example]]
== {api-examples-title}


[[ml-evaluate-oldetection-example]]
=== {oldetection-cap}

[source,console]
--------------------------------------------------
POST _ml/data_frame/_evaluate
{
  "index": "my_analytics_dest_index",
  "evaluation": {
    "outlier_detection": {
      "actual_field": "is_outlier",
      "predicted_probability_field": "ml.outlier_score"
    }
  }
}
--------------------------------------------------
// TEST[skip:TBD]

The API returns the following results:

[source,console-result]
----
{
  "outlier_detection": {
    "auc_roc": {
      "score": 0.92584757746414444
    },
    "confusion_matrix": {
      "0.25": {
          "tp": 5,
          "fp": 9,
          "tn": 204,
          "fn": 5
      },
      "0.5": {
          "tp": 1,
          "fp": 5,
          "tn": 208,
          "fn": 9
      },
      "0.75": {
          "tp": 0,
          "fp": 4,
          "tn": 209,
          "fn": 10
      }
    },
    "precision": {
        "0.25": 0.35714285714285715,
        "0.5": 0.16666666666666666,
        "0.75": 0
    },
    "recall": {
        "0.25": 0.5,
        "0.5": 0.1,
        "0.75": 0
    }
  }
}
----


[[ml-evaluate-regression-example]]
=== {regression-cap}

[source,console]
--------------------------------------------------
POST _ml/data_frame/_evaluate
{
  "index": "house_price_predictions", <1>
  "query": {
      "bool": {
        "filter": [
          { "term":  { "ml.is_training": false } } <2>
        ]
      }
  },
  "evaluation": {
    "regression": { 
      "actual_field": "price", <3>
      "predicted_field": "ml.price_prediction", <4>
      "metrics": {  
        "r_squared": {},
        "mse": {}
      }
    }
  }
}
--------------------------------------------------
// TEST[skip:TBD]

<1> The output destination index from a {dfanalytics} {reganalysis}.
<2> In this example, a test/train split (`training_percent`) was defined for the 
{reganalysis}. This query limits evaluation to be performed on the test split 
only. 
<3> The ground truth value for the actual house price. This is required in order 
to evaluate results.
<4> The predicted value for house price calculated by the {reganalysis}.


The following example calculates the training error:

[source,console]
--------------------------------------------------
POST _ml/data_frame/_evaluate
{
  "index": "student_performance_mathematics_reg",
  "query": {
    "term": {
      "ml.is_training": {
        "value": true <1>
      }
    }
  },
  "evaluation": {
    "regression": { 
      "actual_field": "G3", <2>
      "predicted_field": "ml.G3_prediction", <3>
      "metrics": {  
        "r_squared": {},
        "mse": {}
      }
    }
  }
}
--------------------------------------------------
// TEST[skip:TBD]

<1> In this example, a test/train split (`training_percent`) was defined for the 
{reganalysis}. This query limits evaluation to be performed on the train split 
only. It means that a training error will be calculated.
<2> The field that contains the ground truth value for the actual student 
performance. This is required in order to evaluate results.
<3> The field that contains the predicted value for student performance 
calculated by the {reganalysis}.


The next example calculates the testing error. The only difference compared with 
the previous example is that `ml.is_training` is set to `false` this time, so 
the query excludes the train split from the evaluation.

[source,console]
--------------------------------------------------
POST _ml/data_frame/_evaluate
{
  "index": "student_performance_mathematics_reg",
  "query": {
    "term": {
      "ml.is_training": {
        "value": false <1>
      }
    }
  },
  "evaluation": {
    "regression": { 
      "actual_field": "G3", <2>
      "predicted_field": "ml.G3_prediction", <3>
      "metrics": {  
        "r_squared": {},
        "mse": {}
      }
    }
  }
}
--------------------------------------------------
// TEST[skip:TBD]

<1> In this example, a test/train split (`training_percent`) was defined for the 
{reganalysis}. This query limits evaluation to be performed on the test split 
only. It means that a testing error will be calculated.
<2> The field that contains the ground truth value for the actual student 
performance. This is required in order to evaluate results.
<3> The field that contains the predicted value for student performance 
calculated by the {reganalysis}.


[[ml-evaluate-classification-example]]
=== {classification-cap}


[source,console]
--------------------------------------------------
POST _ml/data_frame/_evaluate
{ 
   "index": "animal_classification",
   "evaluation": {
      "classification": { <1>
         "actual_field": "animal_class", <2>
         "predicted_field": "ml.animal_class_prediction", <3>
         "metrics": {  
           "multiclass_confusion_matrix" : {} <4>
         }
      }
   }
}
--------------------------------------------------
// TEST[skip:TBD]

<1> The evaluation type.
<2> The field that contains the ground truth value for the actual animal 
classification. This is required in order to evaluate results.
<3> The field that contains the predicted value for animal classification by 
the {classanalysis}.
<4> Specifies the metric for the evaluation.


The API returns the following result:

[source,console-result]
--------------------------------------------------
{
   "classification" : {
      "multiclass_confusion_matrix" : {
         "confusion_matrix" : [
         {
            "actual_class" : "cat", <1>
            "actual_class_doc_count" : 12, <2>
            "predicted_classes" : [ <3>
              {
                "predicted_class" : "cat",
                "count" : 12 <4>
              },
              {
                "predicted_class" : "dog",
                "count" : 0 <5>
              }
            ],
            "other_predicted_class_doc_count" : 0 <6>
          },
          {
            "actual_class" : "dog",
            "actual_class_doc_count" : 11,
            "predicted_classes" : [
              {
                "predicted_class" : "dog",
                "count" : 7
              },
              {
                "predicted_class" : "cat",
                "count" : 4
              }
            ],
            "other_predicted_class_doc_count" : 0
          }
        ],
        "other_actual_class_count" : 0
      }
    }
  }
--------------------------------------------------
<1> The name of the actual class that the analysis tried to predict.
<2> The number of documents in the index that belong to the `actual_class`.
<3> This object contains the list of the predicted classes and the number of 
predictions associated with the class.
<4> The number of cats in the dataset that are correctly identified as cats.
<5> The number of cats in the dataset that are incorrectly classified as dogs.
<6> The number of documents that are classified as a class that is not listed as 
a `predicted_class`.