浏览代码

[DOCS] Adds custom feature processors description to PUT DFA API (#67424)

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
István Zoltán Szabó 4 年之前
父节点
当前提交
addb5cbd3a
共有 2 个文件被更改,包括 185 次插入1 次删除
  1. 105 0
      docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc
  2. 80 1
      docs/reference/ml/ml-shared.asciidoc

+ 105 - 0
docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

@@ -120,6 +120,111 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
 `feature_processors`::::
 `feature_processors`::::
 (Optional, list)
 (Optional, list)
 include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors]
 include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors]
++
+.Properties of `feature_processors`
+[%collapsible%open]
+======
+`frequency_encoding`::::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-frequency]
++
+.Properties of `frequency_encoding`
+[%collapsible%open]
+=======
+`feature_name`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-feat-name]
+
+`field`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]
+
+`frequency_map`::::
+(Required, object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-frequency-map]
+=======
+
+`multi_encoding`::::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-multi]
++
+.Properties of `multi_encoding`
+[%collapsible%open]
+=======
+`processors`::::
+(Required, array)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-multi-proc]
+=======
+
+`ngram_encoding`::::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram]
++
+.Properties of `ngram_encoding`
+[%collapsible%open]
+=======
+`feature_prefix`::::
+(Optional, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-feat-pref]
+
+`field`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-field]
+
+`length`::::
+(Optional, integer)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-length]
+
+`n_grams`::::
+(Required, array)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-ngrams]
+
+`start`::::
+(Optional, integer)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-start]
+=======
+
+`one_hot_encoding`::::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-one-hot]
++
+.Properties of `one_hot_encoding`
+[%collapsible%open]
+=======
+`field`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]
+
+`hot_map`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-one-hot-map]
+=======
+
+`target_mean_encoding`::::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean]
++
+.Properties of `target_mean_encoding`
+[%collapsible%open]
+=======
+`default_value`::::
+(Required, integer)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean-default]
+
+`feature_name`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-feat-name]
+
+`field`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]
+
+`target_map`::::
+(Required, object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean-map]
+=======
+
+======
 
 
 `gamma`::::
 `gamma`::::
 (Optional, double)
 (Optional, double)

+ 80 - 1
docs/reference/ml/ml-shared.asciidoc

@@ -559,9 +559,88 @@ A collection of feature preprocessors that modify one or more included fields.
 The analysis uses the resulting one or more features instead of the
 The analysis uses the resulting one or more features instead of the
 original document field. Multiple `feature_processors` entries can refer to the
 original document field. Multiple `feature_processors` entries can refer to the
 same document fields.
 same document fields.
-Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs.
+Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] 
+still occurs.
 end::dfas-feature-processors[]
 end::dfas-feature-processors[]
 
 
+tag::dfas-feature-processors-feat-name[]
+The resulting feature name.
+end::dfas-feature-processors-feat-name[]
+
+tag::dfas-feature-processors-field[]
+The name of the field to encode.
+end::dfas-feature-processors-field[]
+
+tag::dfas-feature-processors-frequency[]
+The configuration information necessary to perform frequency encoding.
+end::dfas-feature-processors-frequency[]
+
+tag::dfas-feature-processors-frequency-map[]
+The resulting frequency map for the field value. If the field value is missing 
+from the `frequency_map`, the resulting value is `0`.
+end::dfas-feature-processors-frequency-map[]
+
+tag::dfas-feature-processors-multi[]
+The configuration information necessary to perform multi encoding. It allows 
+multiple processors to be changed together. This way the output of a processor 
+can then be passed to another as an input.
+end::dfas-feature-processors-multi[]
+
+tag::dfas-feature-processors-multi-proc[]
+The ordered array of custom processors to execute. Must be more than 1.
+end::dfas-feature-processors-multi-proc[]
+
+tag::dfas-feature-processors-ngram[]
+The configuration information necessary to perform ngram encoding. Features 
+written out by this encoder have the following name format: 
+`<feature_prefix>.<ngram><string position>`. For example, if the 
+`feature_prefix` is `f`, the feature name for the second unigram in a string is 
+`f.11`.
+end::dfas-feature-processors-ngram[]
+
+tag::dfas-feature-processors-ngram-feat-pref[]
+The feature name prefix. Defaults to `ngram_<start>_<length>`.
+end::dfas-feature-processors-ngram-feat-pref[]
+
+tag::dfas-feature-processors-ngram-field[]
+The name of the text field to encode.
+end::dfas-feature-processors-ngram-field[]
+
+tag::dfas-feature-processors-ngram-length[]
+Specifies the length of the ngram substring. Defaults to `50`. Must be greater 
+than `0`.
+end::dfas-feature-processors-ngram-length[]
+
+tag::dfas-feature-processors-ngram-ngrams[]
+Specifies which ngrams to gather. It’s an array of integer values where the 
+minimum value is 1, and a maximum value is 5.
+end::dfas-feature-processors-ngram-ngrams[]
+
+tag::dfas-feature-processors-ngram-start[]
+Specifies the zero-indexed start of the ngram substring. Negative values are 
+allowed for encoding ngram of string suffixes. Defaults to `0`.
+end::dfas-feature-processors-ngram-start[]
+
+tag::dfas-feature-processors-one-hot[]
+The configuration information necessary to perform one hot encoding.
+end::dfas-feature-processors-one-hot[]
+
+tag::dfas-feature-processors-one-hot-map[]
+The one hot map mapping the field value with the column name.
+end::dfas-feature-processors-one-hot-map[]
+
+tag::dfas-feature-processors-target-mean[]
+The configuration information necessary to perform target mean encoding.
+end::dfas-feature-processors-target-mean[]
+
+tag::dfas-feature-processors-target-mean-default[]
+The default value if field value is not found in the `target_map`.
+end::dfas-feature-processors-target-mean-default[]
+
+tag::dfas-feature-processors-target-mean-map[]
+The field value to target mean transition map.
+end::dfas-feature-processors-target-mean-map[]
+
 tag::dfas-iteration[]
 tag::dfas-iteration[]
 The number of iterations on the analysis.
 The number of iterations on the analysis.
 end::dfas-iteration[]
 end::dfas-iteration[]