Browse Source

[DOCS] Adds custom feature processors description to PUT DFA API (#67424)

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
István Zoltán Szabó 4 years ago
parent
commit
addb5cbd3a

+ 105 - 0
docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

@@ -120,6 +120,111 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=feature-bag-fraction]
 `feature_processors`::::
 (Optional, list)
 include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors]
++
+.Properties of `feature_processors`
+[%collapsible%open]
+======
+`frequency_encoding`::::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-frequency]
++
+.Properties of `frequency_encoding`
+[%collapsible%open]
+=======
+`feature_name`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-feat-name]
+
+`field`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]
+
+`frequency_map`::::
+(Required, object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-frequency-map]
+=======
+
+`multi_encoding`::::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-multi]
++
+.Properties of `multi_encoding`
+[%collapsible%open]
+=======
+`processors`::::
+(Required, array)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-multi-proc]
+=======
+
+`ngram_encoding`::::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram]
++
+.Properties of `ngram_encoding`
+[%collapsible%open]
+=======
+`feature_prefix`::::
+(Optional, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-feat-pref]
+
+`field`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-field]
+
+`length`::::
+(Optional, integer)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-length]
+
+`n_grams`::::
+(Required, array)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-ngrams]
+
+`start`::::
+(Optional, integer)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-ngram-start]
+=======
+
+`one_hot_encoding`::::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-one-hot]
++
+.Properties of `one_hot_encoding`
+[%collapsible%open]
+=======
+`field`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]
+
+`hot_map`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-one-hot-map]
+=======
+
+`target_mean_encoding`::::
+(object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean]
++
+.Properties of `target_mean_encoding`
+[%collapsible%open]
+=======
+`default_value`::::
+(Required, integer)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean-default]
+
+`feature_name`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-feat-name]
+
+`field`::::
+(Required, string)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-field]
+
+`target_map`::::
+(Required, object)
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=dfas-feature-processors-target-mean-map]
+=======
+
+======
 
 `gamma`::::
 (Optional, double)

+ 80 - 1
docs/reference/ml/ml-shared.asciidoc

@@ -559,9 +559,88 @@ A collection of feature preprocessors that modify one or more included fields.
 The analysis uses the resulting one or more features instead of the
 original document field. Multiple `feature_processors` entries can refer to the
 same document fields.
-Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] still occurs.
+Note, automatic categorical {ml-docs}/ml-feature-encoding.html[feature encoding] 
+still occurs.
 end::dfas-feature-processors[]
 
+tag::dfas-feature-processors-feat-name[]
+The resulting feature name.
+end::dfas-feature-processors-feat-name[]
+
+tag::dfas-feature-processors-field[]
+The name of the field to encode.
+end::dfas-feature-processors-field[]
+
+tag::dfas-feature-processors-frequency[]
+The configuration information necessary to perform frequency encoding.
+end::dfas-feature-processors-frequency[]
+
+tag::dfas-feature-processors-frequency-map[]
+The resulting frequency map for the field value. If the field value is missing 
+from the `frequency_map`, the resulting value is `0`.
+end::dfas-feature-processors-frequency-map[]
+
+tag::dfas-feature-processors-multi[]
+The configuration information necessary to perform multi encoding. It allows 
+multiple processors to be changed together. This way the output of a processor 
+can then be passed to another as an input.
+end::dfas-feature-processors-multi[]
+
+tag::dfas-feature-processors-multi-proc[]
+The ordered array of custom processors to execute. Must be more than 1.
+end::dfas-feature-processors-multi-proc[]
+
+tag::dfas-feature-processors-ngram[]
+The configuration information necessary to perform ngram encoding. Features 
+written out by this encoder have the following name format: 
+`<feature_prefix>.<ngram><string position>`. For example, if the 
+`feature_prefix` is `f`, the feature name for the second unigram in a string is 
+`f.11`.
+end::dfas-feature-processors-ngram[]
+
+tag::dfas-feature-processors-ngram-feat-pref[]
+The feature name prefix. Defaults to `ngram_<start>_<length>`.
+end::dfas-feature-processors-ngram-feat-pref[]
+
+tag::dfas-feature-processors-ngram-field[]
+The name of the text field to encode.
+end::dfas-feature-processors-ngram-field[]
+
+tag::dfas-feature-processors-ngram-length[]
+Specifies the length of the ngram substring. Defaults to `50`. Must be greater 
+than `0`.
+end::dfas-feature-processors-ngram-length[]
+
+tag::dfas-feature-processors-ngram-ngrams[]
+Specifies which ngrams to gather. It’s an array of integer values where the 
+minimum value is 1, and a maximum value is 5.
+end::dfas-feature-processors-ngram-ngrams[]
+
+tag::dfas-feature-processors-ngram-start[]
+Specifies the zero-indexed start of the ngram substring. Negative values are 
+allowed for encoding ngram of string suffixes. Defaults to `0`.
+end::dfas-feature-processors-ngram-start[]
+
+tag::dfas-feature-processors-one-hot[]
+The configuration information necessary to perform one hot encoding.
+end::dfas-feature-processors-one-hot[]
+
+tag::dfas-feature-processors-one-hot-map[]
+The one hot map mapping the field value with the column name.
+end::dfas-feature-processors-one-hot-map[]
+
+tag::dfas-feature-processors-target-mean[]
+The configuration information necessary to perform target mean encoding.
+end::dfas-feature-processors-target-mean[]
+
+tag::dfas-feature-processors-target-mean-default[]
+The default value if field value is not found in the `target_map`.
+end::dfas-feature-processors-target-mean-default[]
+
+tag::dfas-feature-processors-target-mean-map[]
+The field value to target mean transition map.
+end::dfas-feature-processors-target-mean-map[]
+
 tag::dfas-iteration[]
 The number of iterations on the analysis.
 end::dfas-iteration[]