123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291 |
- [role="xpack"]
- [testenv="platinum"]
- [[put-dfanalytics]]
- === Create {dfanalytics-jobs} API
- [subs="attributes"]
- ++++
- <titleabbrev>Create {dfanalytics-jobs}</titleabbrev>
- ++++
- Instantiates a {dfanalytics-job}.
- experimental[]
- [[ml-put-dfanalytics-request]]
- ==== {api-request-title}
- `PUT _ml/data_frame/analytics/<data_frame_analytics_id>`
- [[ml-put-dfanalytics-prereq]]
- ==== {api-prereq-title}
- * You must have `machine_learning_admin` built-in role to use this API. You must
- also have `read` and `view_index_metadata` privileges on the source index and
- `read`, `create_index`, and `index` privileges on the destination index. For
- more information, see <<security-privileges>> and <<built-in-roles>>.
- [[ml-put-dfanalytics-desc]]
- ==== {api-description-title}
- This API creates a {dfanalytics-job} that performs an analysis on the source
- index and stores the outcome in a destination index.
- The destination index will be automatically created if it does not exist. The
- `index.number_of_shards` and `index.number_of_replicas` settings of the source
- index will be copied over the destination index. When the source index matches
- multiple indices, these settings will be set to the maximum values found in the
- source indices.
- The mappings of the source indices are also attempted to be copied over
- to the destination index, however, if the mappings of any of the fields don't
- match among the source indices, the attempt will fail with an error message.
- If the destination index already exists, then it will be use as is. This makes
- it possible to set up the destination index in advance with custom settings
- and mappings.
- [[ml-put-dfanalytics-supported-fields]]
- ===== Supported fields
- ====== {oldetection-cap}
- {oldetection-cap} requires numeric or boolean data to analyze. The algorithms
- don't support missing values therefore fields that have data types other than
- numeric or boolean are ignored. Documents where included fields contain missing
- values, null values, or an array are also ignored. Therefore the `dest` index
- may contain documents that don't have an {olscore}.
- ====== {regression-cap}
- {regression-cap} supports fields that are numeric, boolean, text, keyword and ip. It
- is also tolerant of missing values. Fields that are supported are included in
- the analysis, other fields are ignored. Documents where included fields contain
- an array with two or more values are also ignored. Documents in the `dest` index
- that don’t contain a results field are not included in the {reganalysis}.
- [[ml-put-dfanalytics-path-params]]
- ==== {api-path-parms-title}
- `<data_frame_analytics_id>`::
- (Required, string) A numerical character string that uniquely identifies the
- {dfanalytics-job}. This identifier can contain lowercase alphanumeric
- characters (a-z and 0-9), hyphens, and underscores. It must start and end with
- alphanumeric characters.
- [[ml-put-dfanalytics-request-body]]
- ==== {api-request-body-title}
- `analysis`::
- (Required, object) Defines the type of {dfanalytics} you want to perform on
- your source index. For example: `outlier_detection`. See
- <<dfanalytics-types>>.
-
- `analyzed_fields`::
- (Optional, object) You can specify both `includes` and/or `excludes` patterns.
- If `analyzed_fields` is not set, only the relevant fields will be included.
- For example, all the numeric fields for {oldetection}. For the supported field
- types, see <<ml-put-dfanalytics-supported-fields>>.
-
- `includes`:::
- (Optional, array) An array of strings that defines the fields that will be
- included in the analysis.
-
- `excludes`:::
- (Optional, array) An array of strings that defines the fields that will be
- excluded from the analysis.
- `description`::
- (Optional, string) A description of the job.
- `dest`::
- (Required, object) The destination configuration, consisting of `index` and
- optionally `results_field` (`ml` by default).
-
- `index`:::
- (Required, string) Defines the _destination index_ to store the results of
- the {dfanalytics-job}.
-
- `results_field`:::
- (Optional, string) Defines the name of the field in which to store the
- results of the analysis. Default to `ml`.
-
- `model_memory_limit`::
- (Optional, string) The approximate maximum amount of memory resources that are
- permitted for analytical processing. The default value for {dfanalytics-jobs}
- is `1gb`. If your `elasticsearch.yml` file contains an
- `xpack.ml.max_model_memory_limit` setting, an error occurs when you try to
- create {dfanalytics-jobs} that have `model_memory_limit` values greater than
- that setting. For more information, see <<ml-settings>>.
-
- `source`::
- (Required, object) The source configuration, consisting of `index` and
- optionally a `query`.
-
- `index`:::
- (Required, string or array) Index or indices on which to perform the
- analysis. It can be a single index or index pattern as well as an array of
- indices or patterns.
-
- `query`:::
- (Optional, object) The {es} query domain-specific language
- (<<query-dsl,DSL>>). This value corresponds to the query object in an {es}
- search POST body. All the options that are supported by {es} can be used,
- as this object is passed verbatim to {es}. By default, this property has
- the following value: `{"match_all": {}}`.
- [[ml-put-dfanalytics-example]]
- ==== {api-examples-title}
- [[ml-put-dfanalytics-example-od]]
- ===== {oldetection-cap} example
- The following example creates the `loganalytics` {dfanalytics-job}, the analysis
- type is `outlier_detection`:
- [source,console]
- --------------------------------------------------
- PUT _ml/data_frame/analytics/loganalytics
- {
- "description": "Outlier detection on log data",
- "source": {
- "index": "logdata"
- },
- "dest": {
- "index": "logdata_out"
- },
- "analysis": {
- "outlier_detection": {
- "compute_feature_influence": true,
- "outlier_fraction": 0.05,
- "standardization_enabled": true
- }
- }
- }
- --------------------------------------------------
- // TEST[setup:setup_logdata]
- The API returns the following result:
- [source,console-result]
- ----
- {
- "id": "loganalytics",
- "description": "Outlier detection on log data",
- "source": {
- "index": ["logdata"],
- "query": {
- "match_all": {}
- }
- },
- "dest": {
- "index": "logdata_out",
- "results_field": "ml"
- },
- "analysis": {
- "outlier_detection": {
- "compute_feature_influence": true,
- "outlier_fraction": 0.05,
- "standardization_enabled": true
- }
- },
- "model_memory_limit": "1gb",
- "create_time" : 1562265491319,
- "version" : "8.0.0"
- }
- ----
- // TESTRESPONSE[s/1562265491319/$body.$_path/]
- // TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]
- [[ml-put-dfanalytics-example-r]]
- ===== {regression-cap} examples
- The following example creates the `house_price_regression_analysis`
- {dfanalytics-job}, the analysis type is `regression`:
- [source,console]
- --------------------------------------------------
- PUT _ml/data_frame/analytics/house_price_regression_analysis
- {
- "source": {
- "index": "houses_sold_last_10_yrs"
- },
- "dest": {
- "index": "house_price_predictions"
- },
- "analysis":
- {
- "regression": {
- "dependent_variable": "price"
- }
- }
- }
- --------------------------------------------------
- // TEST[skip:TBD]
- The API returns the following result:
- [source,console-result]
- ----
- {
- "id" : "house_price_regression_analysis",
- "source" : {
- "index" : [
- "houses_sold_last_10_yrs"
- ],
- "query" : {
- "match_all" : { }
- }
- },
- "dest" : {
- "index" : "house_price_predictions",
- "results_field" : "ml"
- },
- "analysis" : {
- "regression" : {
- "dependent_variable" : "price",
- "training_percent" : 100
- }
- },
- "model_memory_limit" : "1gb",
- "create_time" : 1567168659127,
- "version" : "8.0.0"
- }
- ----
- // TESTRESPONSE[s/1567168659127/$body.$_path/]
- // TESTRESPONSE[s/"version": "8.0.0"/"version": $body.version/]
- The following example creates a job and specifies a training percent:
- [source,console]
- --------------------------------------------------
- PUT _ml/data_frame/analytics/student_performance_mathematics_0.3
- {
- "source": {
- "index": "student_performance_mathematics"
- },
- "dest": {
- "index":"student_performance_mathematics_reg"
- },
- "analysis":
- {
- "regression": {
- "dependent_variable": "G3",
- "training_percent": 70 <1>
- }
- }
- }
- --------------------------------------------------
- // TEST[skip:TBD]
- <1> The `training_percent` defines the percentage of the data set that will be used
- for training the model.
|