5 years ago · 870e1891d9
--- a/docs/reference/ml/ml-shared.asciidoc
+++ b/docs/reference/ml/ml-shared.asciidoc
@@ -106,12 +106,12 @@ don't support missing values therefore fields that have data types other than
 
				 numeric or boolean are ignored. Documents where included fields contain missing 
			
 
				 values, null values, or an array are also ignored. Therefore the `dest` index 
			
 
				 may contain documents that don't have an {olscore}.
			
 
				-* {regression-cap} supports fields that are numeric, `boolean`, `text`, `keyword`, 
			
 
				-and `ip`. It is also tolerant of missing values. Fields that are supported are 
			
 
				-included in the analysis, other fields are ignored. Documents where included
			
 
				-fields contain  an array with two or more values are also ignored. Documents in
			
 
				-the `dest` index  that don’t contain a results field are not included in the
			
 
				- {reganalysis}.
			
 
				+* {regression-cap} supports fields that are numeric, `boolean`, `text`, 
			
 
				+`keyword`, and `ip`. It is also tolerant of missing values. Fields that are 
			
 
				+supported are included in the analysis, other fields are ignored. Documents 
			
 
				+where included fields contain  an array with two or more values are also 
			
 
				+ignored. Documents in the `dest` index  that don’t contain a results field are 
			
 
				+not included in the {reganalysis}.
			
 
				 * {classification-cap} supports fields that are numeric, `boolean`, `text`,
			
 
				 `keyword`, and `ip`. It is also tolerant of missing values. Fields that are 
			
 
				 supported are included in the analysis, other fields are ignored. Documents
			
@@ -144,7 +144,8 @@ of a node to run the job.
 
				 end::assignment-explanation-anomaly-jobs[]
			
 
				 
			
 
				 tag::assignment-explanation-datafeeds[]
			
 
				-For started {dfeeds} only, contains messages relating to the selection of a node.
			
 
				+For started {dfeeds} only, contains messages relating to the selection of a 
			
 
				+node.
			
 
				 end::assignment-explanation-datafeeds[]
			
 
				 
			
 
				 tag::assignment-explanation-dfanalytics[]
			
@@ -323,10 +324,10 @@ If `true`, the feature influence calculation is enabled. Defaults to `true`.
 
				 end::compute-feature-influence[]
			
 
				 
			
 
				 tag::chunking-config[]
			
 
				-{dfeeds-cap} might be required to search over long time periods, for several months
			
 
				-or years. This search is split into time chunks in order to ensure the load
			
 
				-on {es} is managed. Chunking configuration controls how the size of these time
			
 
				-chunks are calculated and is an advanced configuration option.
			
 
				+{dfeeds-cap} might be required to search over long time periods, for several 
			
 
				+months or years. This search is split into time chunks in order to ensure the 
			
 
				+load on {es} is managed. Chunking configuration controls how the size of these 
			
 
				+time chunks are calculated and is an advanced configuration option.
			
 
				 A chunking configuration object has the following properties:
			
 
				 
			
 
				 `chunking_config`.`mode`:::
			
@@ -381,7 +382,8 @@ end::custom-rules-scope-filter-type[]
 
				 tag::custom-rules-conditions[]
			
 
				 An optional array of numeric conditions when the rule applies. A rule must
			
 
				 either have a non-empty scope or at least one condition. Multiple conditions are
			
 
				-combined together with a logical `AND`. A condition has the following properties:
			
 
				+combined together with a logical `AND`. A condition has the following 
			
 
				+properties:
			
 
				 end::custom-rules-conditions[]
			
 
				 
			
 
				 tag::custom-rules-conditions-applies-to[]
			
@@ -393,7 +395,8 @@ end::custom-rules-conditions-applies-to[]
 
				 
			
 
				 tag::custom-rules-conditions-operator[]
			
 
				 Specifies the condition operator. The available options are `gt` (greater than),
			
 
				-`gte` (greater than or equals), `lt` (less than) and `lte` (less than or equals).
			
 
				+`gte` (greater than or equals), `lt` (less than) and `lte` (less than or 
			
 
				+equals).
			
 
				 end::custom-rules-conditions-operator[]
			
 
				 
			
 
				 tag::custom-rules-conditions-value[]
			
@@ -432,97 +435,91 @@ tag::data-frame-analytics[]
 
				 An array of {dfanalytics-job} resources, which are sorted by the `id` value in 
			
 
				 ascending order.
			
 
				 
			
 
				+`analysis`:::
			
 
				+(object) The type of analysis that is performed on the `source`.
			
 
				+
			
 
				+`analyzed_fields`:::
			
 
				+(object) Contains `includes` and/or `excludes` patterns that select which fields 
			
 
				+are included in the analysis.
			
 
				+
			
 
				+`analyzed_fields`.`excludes`:::
			
 
				+(Optional, array) An array of strings that defines the fields that are excluded 
			
 
				+from the analysis.
			
 
				+    
			
 
				+`analyzed_fields`.`includes`:::
			
 
				+(Optional, array) An array of strings that defines the fields that are included 
			
 
				+in the analysis.
			
 
				+
			
 
				+`dest`:::
			
 
				+(string) The destination configuration of the analysis.
			
 
				+
			
 
				+`dest`.`index`:::
			
 
				+(string) The _destination index_ that stores the results of the 
			
 
				+{dfanalytics-job}.
			
 
				+
			
 
				+`dest`.`results_field`:::
			
 
				+(string) The name of the field that stores the results of the analysis. Defaults 
			
 
				+to `ml`.
			
 
				+
			
 
				 `id`:::
			
 
				 (string) The unique identifier of the {dfanalytics-job}.
			
 
				 
			
 
				+`model_memory_limit`:::
			
 
				+(string) The `model_memory_limit` that has been set to the {dfanalytics-job}.
			
 
				+
			
 
				 `source`:::
			
 
				 (object) The configuration of how the analysis data is sourced. It has an 
			
 
				 `index` parameter and optionally a `query` and a `_source`.
			
 
				   
			
 
				-`index`::::
			
 
				+`source`.`index`:::
			
 
				 (array) Index or indices on which to perform the analysis. It can be a single 
			
 
				 index or index pattern as well as an array of indices or patterns.
			
 
				     
			
 
				-`query`::::
			
 
				+`source`.`query`:::
			
 
				 (object) The query that has been specified for the {dfanalytics-job}. The {es} 
			
 
				 query domain-specific language (<<query-dsl,DSL>>). This value corresponds to 
			
 
				 the query object in an {es} search POST body. By default, this property has the 
			
 
				 following value: `{"match_all": {}}`.
			
 
				 
			
 
				-`_source`::::
			
 
				+`source`.`_source`:::
			
 
				 (object) Contains the specified `includes` and/or `excludes` patterns that 
			
 
				 select which fields are present in the destination. Fields that are excluded 
			
 
				 cannot be included in the analysis.
			
 
				+
			
 
				+`source`.`_source`.`excludes`:::
			
 
				+(array) An array of strings that defines the fields that are excluded from the 
			
 
				+destination.
			
 
				         
			
 
				-`includes`:::::
			
 
				+`source`.`_source`.`includes`:::
			
 
				 (array) An array of strings that defines the fields that are included in the 
			
 
				 destination.
			
 
				-          
			
 
				-`excludes`:::::
			
 
				-(array) An array of strings that defines the fields that are excluded from the 
			
 
				-destination.
			
 
				-
			
 
				-`dest`:::
			
 
				-(string) The destination configuration of the analysis.
			
 
				-
			
 
				-`index`::::
			
 
				-(string) The _destination index_ that stores the results of the 
			
 
				-{dfanalytics-job}.
			
 
				-
			
 
				-`results_field`::::
			
 
				-(string) The name of the field that stores the results of the analysis. Defaults 
			
 
				-to `ml`.
			
 
				-
			
 
				-`analysis`:::
			
 
				-(object) The type of analysis that is performed on the `source`.
			
 
				-
			
 
				-`analyzed_fields`:::
			
 
				-(object) Contains `includes` and/or `excludes` patterns that select which fields 
			
 
				-are included in the analysis.
			
 
				-    
			
 
				-`includes`::::
			
 
				-(Optional, array) An array of strings that defines the fields that are included 
			
 
				-in the analysis.
			
 
				-      
			
 
				-`excludes`::::
			
 
				-(Optional, array) An array of strings that defines the fields that are excluded 
			
 
				-from the analysis.
			
 
				-
			
 
				-`model_memory_limit`:::
			
 
				-(string) The `model_memory_limit` that has been set to the {dfanalytics-job}.
			
 
				 end::data-frame-analytics[]
			
 
				 
			
 
				 tag::data-frame-analytics-stats[]
			
 
				 An array of statistics objects for {dfanalytics-jobs}, which are
			
 
				 sorted by the `id` value in ascending order.
			
 
				 
			
 
				+`assignment_explanation`:::
			
 
				+(string)
			
 
				+For running jobs only, contains messages relating to the selection of a node to 
			
 
				+run the job.
			
 
				+
			
 
				 `id`:::
			
 
				-(string) The unique identifier of the {dfanalytics-job}.
			
 
				-  
			
 
				-`state`:::
			
 
				-(string) Current state of the {dfanalytics-job}.
			
 
				-  
			
 
				-`progress`:::
			
 
				-(array) The progress report of the {dfanalytics-job} by phase.
			
 
				-  
			
 
				-`phase`::::
			
 
				-(string) Defines the phase of the {dfanalytics-job}. Possible phases: 
			
 
				-`reindexing`, `loading_data`, `analyzing`, and `writing_results`.
			
 
				-    
			
 
				-`progress_percent`::::
			
 
				-(integer) The progress that the {dfanalytics-job} has made expressed in 
			
 
				-percentage.
			
 
				+(string)
			
 
				+The unique identifier of the {dfanalytics-job}.
			
 
				 
			
 
				 `memory_usage`:::
			
 
				-(Optional, Object) An object describing memory usage of the analytics.
			
 
				-It will be present only after the job has started and memory usage has
			
 
				-been reported.
			
 
				+(Optional, object)
			
 
				+An object describing memory usage of the analytics. It is present only after the 
			
 
				+job is started and memory usage is reported.
			
 
				 
			
 
				-`timestamp`::::
			
 
				-(date) The timestamp when memory usage was calculated.
			
 
				+`memory_usage`.`peak_usage_bytes`:::
			
 
				+(long)
			
 
				+The number of bytes used at the highest peak of memory usage.
			
 
				 
			
 
				-`peak_usage_bytes`::::
			
 
				-(long) The number of bytes used at the highest peak of memory usage.
			
 
				+`memory_usage`.`timestamp`:::
			
 
				+(date)
			
 
				+The timestamp when memory usage was calculated.
			
 
				 
			
 
				 `node`:::
			
 
				 (object)
			
@@ -550,10 +547,19 @@ The node name.
 
				 (string)
			
 
				 The host and port where transport HTTP connections are accepted.
			
 
				 
			
 
				-`assignment_explanation`:::
			
 
				-(string)
			
 
				-For running jobs only, contains messages relating to the selection of a node to 
			
 
				-run the job.
			
 
				+`progress`:::
			
 
				+(array) The progress report of the {dfanalytics-job} by phase.
			
 
				+  
			
 
				+`progress`.`phase`:::
			
 
				+(string) Defines the phase of the {dfanalytics-job}. Possible phases: 
			
 
				+`reindexing`, `loading_data`, `analyzing`, and `writing_results`.
			
 
				+    
			
 
				+`progress`.`progress_percent`:::
			
 
				+(integer) The progress that the {dfanalytics-job} has made expressed in 
			
 
				+percentage.
			
 
				+
			
 
				+`state`:::
			
 
				+(string) Current state of the {dfanalytics-job}.
			
 
				 end::data-frame-analytics-stats[]
			
 
				 
			
 
				 tag::datafeed-id[]
			
@@ -576,8 +582,8 @@ prior training.)
 
				 end::dead-category-count[]
			
 
				 
			
 
				 tag::decompress-definition[]
			
 
				-Specifies whether the included model definition should be returned as a JSON map (`true`) or 
			
 
				-in a custom compressed format (`false`). Defaults to `true`.
			
 
				+Specifies whether the included model definition should be returned as a JSON map 
			
 
				+(`true`) or in a custom compressed format (`false`). Defaults to `true`.
			
 
				 end::decompress-definition[]
			
 
				 
			
 
				 tag::delayed-data-check-config[]
			
@@ -586,10 +592,10 @@ window. For example: `{"enabled": true, "check_window": "1h"}`.
 
				 +
			
 
				 --
			
 
				 The {dfeed} can optionally search over indices that have already been read in
			
 
				-an effort to determine whether any data has subsequently been added to the index.
			
 
				-If missing data is found, it is a good indication that the `query_delay` option
			
 
				-is set too low and the data is being indexed after the {dfeed} has passed that
			
 
				-moment in time. See 
			
 
				+an effort to determine whether any data has subsequently been added to the 
			
 
				+index. If missing data is found, it is a good indication that the `query_delay` 
			
 
				+option is set too low and the data is being indexed after the {dfeed} has passed 
			
 
				+that moment in time. See 
			
 
				 {ml-docs}/ml-delayed-data-detection.html[Working with delayed data].
			
 
				 
			
 
				 This check runs only on real-time {dfeeds}.
			
@@ -812,7 +818,8 @@ A comma separated list of influencer field names. Typically these can be the by,
 
				 over, or partition fields that are used in the detector configuration. You might 
			
 
				 also want to use a field name that is not specifically named in a detector, but 
			
 
				 is available as part of the input data. When you use multiple detectors, the use 
			
 
				-of influencers is recommended as it aggregates results for each influencer entity.
			
 
				+of influencers is recommended as it aggregates results for each influencer 
			
 
				+entity.
			
 
				 end::influencers[]
			
 
				 
			
 
				 tag::input-bytes[]
			
@@ -933,9 +940,10 @@ tag::max-empty-searches[]
 
				 If a real-time {dfeed} has never seen any data (including during any initial
			
 
				 training period) then it will automatically stop itself and close its associated
			
 
				 job after this many real-time searches that return no documents. In other words,
			
 
				-it will stop after `frequency` times `max_empty_searches` of real-time operation.
			
 
				-If not set then a {dfeed} with no end time that sees no data will remain started
			
 
				-until it is explicitly stopped. By default this setting is not set.
			
 
				+it will stop after `frequency` times `max_empty_searches` of real-time 
			
 
				+operation. If not set then a {dfeed} with no end time that sees no data will 
			
 
				+remain started until it is explicitly stopped. By default this setting is not 
			
 
				+set.
			
 
				 end::max-empty-searches[]
			
 
				 
			
 
				 tag::maximum-number-trees[]
			
@@ -1092,10 +1100,10 @@ example, `1575402236000 `.
 
				 end::model-snapshot-id[]
			
 
				 
			
 
				 tag::model-snapshot-retention-days[]
			
 
				-Advanced configuration option. The period of time (in days) that model snapshots are retained.
			
 
				-Age is calculated relative to the timestamp of the newest model snapshot.
			
 
				-The default value is `1`, which means snapshots that are one day (twenty-four hours)
			
 
				-older than the newest snapshot are deleted.
			
 
				+Advanced configuration option. The period of time (in days) that model snapshots 
			
 
				+are retained. Age is calculated relative to the timestamp of the newest model 
			
 
				+snapshot. The default value is `1`, which means snapshots that are one day 
			
 
				+(twenty-four hours) older than the newest snapshot are deleted.
			
 
				 end::model-snapshot-retention-days[]
			
 
				 
			
 
				 tag::model-timestamp[]
			
@@ -1250,10 +1258,10 @@ is `shared`, which generates an index named `.ml-anomalies-shared`.
 
				 end::results-index-name[]
			
 
				 
			
 
				 tag::results-retention-days[]
			
 
				-Advanced configuration option. The period of time (in days) that results are retained.
			
 
				-Age is calculated relative to the timestamp of the latest bucket result.
			
 
				-If this property has a non-null value, once per day at 00:30 (server time),
			
 
				-results that are the specified number of days older than the latest
			
 
				+Advanced configuration option. The period of time (in days) that results are 
			
 
				+retained. Age is calculated relative to the timestamp of the latest bucket 
			
 
				+result. If this property has a non-null value, once per day at 00:30 (server 
			
 
				+time), results that are the specified number of days older than the latest 
			
 
				 bucket result are deleted from {es}. The default value is null, which means all
			
 
				 results are retained.
			
 
				 end::results-retention-days[]
			
@@ -1353,11 +1361,11 @@ job must be opened before it can accept further data.
 
				 * `closing`: The job close action is in progress and has not yet completed. A
			
 
				 closing job cannot accept further data.
			
 
				 * `failed`: The job did not finish successfully due to an error. This situation
			
 
				-can occur due to invalid input data, a fatal error occurring during the analysis,
			
 
				-or an external interaction such as the process being killed by the Linux out of
			
 
				-memory (OOM) killer. If the job had irrevocably failed, it must be force closed
			
 
				-and then deleted. If the {dfeed} can be corrected, the job can be closed and
			
 
				-then re-opened.
			
 
				+can occur due to invalid input data, a fatal error occurring during the 
			
 
				+analysis, or an external interaction such as the process being killed by the 
			
 
				+Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be 
			
 
				+force closed and then deleted. If the {dfeed} can be corrected, the job can be 
			
 
				+closed and then re-opened.
			
 
				 * `opened`: The job is available to receive and process data.
			
 
				 * `opening`: The job open action is in progress and has not yet completed.
			
 
				 --