Browse Source

[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806)

This PR addresses the feedback in  https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.

* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections
István Zoltán Szabó 6 năm trước cách đây
mục cha
commit
84793476ba

+ 44 - 47
docs/reference/ml/df-analytics/apis/dfanalyticsresources.asciidoc

@@ -18,10 +18,36 @@
   (object) You can specify both `includes` and/or `excludes` patterns. If 
   `analyzed_fields` is not set, only the relevant fields will be included. For 
   example all the numeric fields for {oldetection}.
+  
+[source,js]
+--------------------------------------------------
+PUT _ml/data_frame/analytics/loganalytics
+{
+  "source": {
+    "index": "logdata"
+  },
+  "dest": {
+    "index": "logdata_out"
+  },
+  "analysis": {
+    "outlier_detection": {
+    }
+  },
+  "analyzed_fields": {
+        "includes": [ "request.bytes", "response.counts.error" ],
+        "excludes": [ "source.geo" ]
+  }
+}
+--------------------------------------------------
+// CONSOLE
+// TEST[setup:setup_logdata]
 
 `dest`::
-  (object) The destination configuration of the analysis. For more information, 
-  see <<dfanalytics-dest-resources>>.
+  (object) The destination configuration of the analysis. The `index` property 
+  (string) is the name of the index in which to store the results of the 
+  {dfanalytics-job}. The `results_field` (string) property defines the name of 
+  the field in which to store the results of the analysis. The default value is 
+  `ml`.
 
 `id`::
   (string) The unique identifier for the {dfanalytics-job}. This identifier can 
@@ -38,25 +64,29 @@
   that setting. For more information, see <<ml-settings>>.
 
 `source`::
-  (object) The source configuration, consisting of `index` and optionally a 
-  `query`. For more information, see <<dfanalytics-source-resources>>.
+  (object) The source configuration, consisting of `index` (array) which is an 
+  array of index names on which to perform the analysis. It can be a single 
+  index or index pattern as well as an array of indices or patterns. Optionally, 
+  `source` can have a `query` (object) property. The {es} query domain-specific 
+  language (DSL). This value corresponds to the query object in an {es} search 
+  POST body. All the options that are supported by {es} can be used, as this 
+  object is passed verbatim to {es}. By default, this property has the following 
+  value: `{"match_all": {}}`.
 
 [[dfanalytics-types]]
 ==== Analysis objects
 
 {dfanalytics-cap} resources contain `analysis` objects. For example, when you
-create a {dfanalytics-job}, you must define the type of analysis it performs.
+create a {dfanalytics-job}, you must define the type of analysis it performs. 
+Currently, `outlier_detection` is the only available type of analysis, however, 
+other types will be added, for example `regression`.
   
 [discrete]
 [[oldetection-resources]]
-===== {oldetection-cap} configuration objects 
+==== {oldetection-cap} configuration objects 
 
 An {oldetection} configuration object has the following properties:
 
-[discrete]
-[[oldetection-properties]]
-==== {api-definitions-title}
-
 `n_neighbors`::
   (integer) Defines the value for how many nearest neighbors each method of 
   {oldetection} will use to calculate its {olscore}. When the value is 
@@ -65,44 +95,11 @@ An {oldetection} configuration object has the following properties:
 `method`::
   (string) Sets the method that {oldetection} uses. If the method is not set 
   {oldetection} uses an ensemble of different methods and normalises and 
-  combines their individual {olscores} to obtain the overall {olscore}. 
-  Available methods are `lof`, `ldof`, `distance_kth_nn`, `distance_knn`.
+  combines their individual {olscores} to obtain the overall {olscore}. We 
+  recommend to use the ensemble method. Available methods are `lof`, `ldof`, 
+  `distance_kth_nn`, `distance_knn`.
 
 `feature_influence_threshold`:: 
   (double) The minimum {olscore} that a document needs to have in order to 
   calculate its {fiscore}. 
-  Value range: 0-1 (`0.1` by default).
-  
-[[dfanalytics-dest-resources]]
-==== Dest configuration objects
-
-{dfanalytics-cap} resources contain `dest` objects. For example, when you
-create a {dfanalytics-job}, you must define its destination.
-
-[discrete]
-[[dfanalytics-dest-properties]]
-==== {api-definitions-title}
-
-`index`::
-  (string) The name of the index in which to store the results of the 
-  {dfanalytics-job}.
-
-`results_field`::
-  (string) The name of the field in which to store the results of the analysis. 
-  The default value is `ml`.
-
-[[dfanalytics-source-resources]]
-==== Source configuration objects
-
-The `source` configuration object has the following properties:
-
-`index`::
-  (array) An array of index names on which to perform the analysis. It can be a 
-  single index or index pattern as well as an array of indices or patterns.
-  
-`query`::
-  (object) The {es} query domain-specific language (DSL). This value
-  corresponds to the query object in an {es} search POST body. All the
-  options that are supported by {es} can be used, as this object is
-  passed verbatim to {es}. By default, this property has the following
-  value: `{"match_all": {}}`.
+  Value range: 0-1 (`0.1` by default).

+ 19 - 3
docs/reference/ml/df-analytics/apis/get-dfanalytics-stats.asciidoc

@@ -43,9 +43,18 @@ information, see {stack-ov}/security-privileges.html[Security privileges] and
 ==== {api-query-parms-title}
 
 `allow_no_match`::
-  (Optional, boolean) If `false` and the `data_frame_analytics_id` does not
-  match any {dfanalytics-job} an error will be returned. The default value is
-  `true`.
+  (Optional, boolean) Specifies what to do when the request:
++
+--
+* Contains wildcard expressions and there are no {dfanalytics-jobs} that match.
+* Contains the `_all` string or no identifiers and there are no matches.
+* Contains wildcard expressions and there are only partial matches. 
+
+The default value is `true`, which returns an empty `data_frame_analytics` array 
+when there are no matches and the subset of results when there are partial 
+matches. If this parameter is `false`, the request returns a `404` status code 
+when there are no matches or only partial matches.
+--
 
 `from`::
   (Optional, integer) Skips the specified number of {dfanalytics-jobs}. The
@@ -64,6 +73,13 @@ The API returns the following information:
   (array) An array of statistics objects for {dfanalytics-jobs}, which are
   sorted by the `id` value in ascending order.
 
+[[ml-get-dfanalytics-stats-response-codes]]
+==== {api-response-codes-title}
+
+`404` (Missing resources)::
+  If `allow_no_match` is `false`, this code indicates that there are no
+  resources that match the request or only partial matches for the request.
+
 [[ml-get-dfanalytics-stats-example]]
 ==== {api-examples-title}
 

+ 25 - 14
docs/reference/ml/df-analytics/apis/get-dfanalytics.asciidoc

@@ -33,10 +33,7 @@ information, see {stack-ov}/security-privileges.html[Security privileges] and
 ==== {api-description-title}
 
 You can get information for multiple {dfanalytics-jobs} in a single API request 
-by using a comma-separated list of {dfanalytics-jobs} or a wildcard expression. 
-You can get information for all {dfanalytics-jobs} by using _all, by specifying 
-`*` as the `<data_frame_analytics_id>`, or by omitting the
-`<data_frame_analytics_id>`.
+by using a comma-separated list of {dfanalytics-jobs} or a wildcard expression.
 
 [[ml-get-dfanalytics-path-params]]
 ==== {api-path-parms-title}
@@ -44,27 +41,34 @@ You can get information for all {dfanalytics-jobs} by using _all, by specifying
 `<data_frame_analytics_id>`::
   (Optional, string) Identifier for the {dfanalytics-job}. If you do not specify
   one of these options, the API returns information for the first hundred
-  {dfanalytics-jobs}.
-  
-`allow_no_match` (Optional)::
-  (boolean) If `false` and the `data_frame_analytics_id` does not match any 
-  {dfanalytics-job} an error will be returned. The default value is `true`.
+  {dfanalytics-jobs}. You can get information for all {dfanalytics-jobs} by 
+  using _all, by specifying `*` as the `<data_frame_analytics_id>`, or by 
+  omitting the `<data_frame_analytics_id>`.
 
 [[ml-get-dfanalytics-query-params]]
 ==== {api-query-parms-title}
 
 `allow_no_match`::
-  (Optional, boolean) If `false` and the `data_frame_analytics_id` does not
-  match any {dfanalytics-job} an error will be returned. The default value is
-  `true`.
+  (Optional, boolean) Specifies what to do when the request:
++
+--
+* Contains wildcard expressions and there are no {dfanalytics-jobs} that match.
+* Contains the `_all` string or no identifiers and there are no matches.
+* Contains wildcard expressions and there are only partial matches. 
+
+The default value is `true`, which returns an empty `data_frame_analytics` array 
+when there are no matches and the subset of results when there are partial 
+matches. If this parameter is `false`, the request returns a `404` status code 
+when there are no matches or only partial matches.
+--
 
 `from`::
   (Optional, integer) Skips the specified number of {dfanalytics-jobs}. The
   default value is `0`.
 
 `size`::
-  (Optional, integer) Specifies the maximum number of {dfanalytics-jobs} to obtain. The 
-  default value is `100`.
+  (Optional, integer) Specifies the maximum number of {dfanalytics-jobs} to 
+  obtain. The default value is `100`.
   
 [[ml-get-dfanalytics-results]]
 ==== {api-response-body-title}
@@ -73,6 +77,13 @@ You can get information for all {dfanalytics-jobs} by using _all, by specifying
   (array) An array of {dfanalytics-job} resources. For more information, see
   <<ml-dfanalytics-resources>>.
 
+[[ml-get-dfanalytics-response-codes]]
+==== {api-response-codes-title}
+
+`404` (Missing resources)::
+  If `allow_no_match` is `false`, this code indicates that there are no
+  resources that match the request or only partial matches for the request.
+
 [[ml-get-dfanalytics-example]]
 ==== {api-examples-title}
 

+ 14 - 4
docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

@@ -67,12 +67,22 @@ and mappings.
   example, all the numeric fields for {oldetection}.
   
 `dest`::
-  (Required, object) The destination configuration, consisting of `index` and optionally 
-  `results_field` (`ml` by default). See <<dfanalytics-dest-resources>>.
+  (Required, object) The destination configuration, consisting of `index` and 
+  optionally `results_field` (`ml` by default). See 
+  <<ml-dfanalytics-properties,{dfanalytics} properties>>.
+  
+`model_memory_limit`::
+  (Optional, string) The approximate maximum amount of memory resources that are 
+  permitted for analytical processing. The default value for {dfanalytics-jobs} 
+  is `1gb`. If your `elasticsearch.yml` file contains an 
+  `xpack.ml.max_model_memory_limit` setting, an error occurs when you try to 
+  create {dfanalytics-jobs} that have `model_memory_limit` values greater than 
+  that setting. For more information, see <<ml-settings>>.
   
 `source`::
-  (Required, object) The source configuration, consisting of `index` and optionally a 
-  `query`. See <<dfanalytics-source-resources>>. 
+  (Required, object) The source configuration, consisting of `index` and 
+  optionally a `query`. See 
+  <<ml-dfanalytics-properties,{dfanalytics} properties>>. 
 
 [[ml-put-dfanalytics-example]]
 ==== {api-examples-title}