Browse Source

[DOCS] Adds outlier detection params to the data frame analytics resources (#46323)

* [DOCS] Adds outlier detection params to the data frame analytics resources.
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
István Zoltán Szabó 6 years ago
parent
commit
bd4d46c416
1 changed files with 25 additions and 9 deletions
  1. 25 9
      docs/reference/ml/df-analytics/apis/dfanalyticsresources.asciidoc

+ 25 - 9
docs/reference/ml/df-analytics/apis/dfanalyticsresources.asciidoc

@@ -108,10 +108,13 @@ other types will be added, for example `regression`.
 
 An {oldetection} configuration object has the following properties:
 
-`n_neighbors`::
-  (integer) Defines the value for how many nearest neighbors each method of 
-  {oldetection} will use to calculate its {olscore}. When the value is 
-  not set, the system will dynamically detect an appropriate value.
+`compute_feature_influence`::
+  (boolean) If `true`, the feature influence calculation is enabled. Defaults to 
+  `true`.
+  
+`feature_influence_threshold`:: 
+  (double) The minimum {olscore} that a document needs to have in order to 
+  calculate its {fiscore}. Value range: 0-1 (`0.1` by default).
 
 `method`::
   (string) Sets the method that {oldetection} uses. If the method is not set 
@@ -119,8 +122,21 @@ An {oldetection} configuration object has the following properties:
   combines their individual {olscores} to obtain the overall {olscore}. We 
   recommend to use the ensemble method. Available methods are `lof`, `ldof`, 
   `distance_kth_nn`, `distance_knn`.
-
-`feature_influence_threshold`:: 
-  (double) The minimum {olscore} that a document needs to have in order to 
-  calculate its {fiscore}. 
-  Value range: 0-1 (`0.1` by default).
+  
+`n_neighbors`::
+  (integer) Defines the value for how many nearest neighbors each method of 
+  {oldetection} will use to calculate its {olscore}. When the value is not set, 
+  different values will be used for different ensemble members. This helps 
+  improve diversity in the ensemble. Therefore, only override this if you are 
+  confident that the value you choose is appropriate for the data set.
+  
+`outlier_fraction`::
+  (double) Sets the proportion of the data set that is assumed to be outlying prior to 
+  {oldetection}. For example, 0.05 means it is assumed that 5% of values are real outliers 
+  and 95% are inliers.
+  
+`standardize_columns`::
+  (boolean) If `true`, then the following operation is performed on the columns 
+  before computing outlier scores: (x_i - mean(x_i)) / sd(x_i). Defaults to 
+  `true`. For more information, see 
+  https://en.wikipedia.org/wiki/Feature_scaling#Standardization_(Z-score_Normalization)[this wiki page about standardization].