|
@@ -6,7 +6,8 @@
|
|
|
++++
|
|
|
|
|
|
|
|
|
-Uses a pre-trained {dfanalytics} model to infer against the data that is being
|
|
|
+Uses a pre-trained {dfanalytics} model or a model deployed for natural
|
|
|
+language processing tasks to infer against the data that is being
|
|
|
ingested in the pipeline.
|
|
|
|
|
|
|
|
@@ -18,7 +19,7 @@ ingested in the pipeline.
|
|
|
| `model_id` | yes | - | (String) The ID or alias for the trained model.
|
|
|
| `target_field` | no | `ml.inference.<processor_tag>` | (String) Field added to incoming documents to contain results objects.
|
|
|
| `field_map` | no | If defined the model's default field map | (Object) Maps the document field names to the known field names of the model. This mapping takes precedence over any default mappings provided in the model configuration.
|
|
|
-| `inference_config` | no | The default settings defined in the model | (Object) Contains the inference type and its options. There are two types: <<inference-processor-regression-opt,`regression`>> and <<inference-processor-classification-opt,`classification`>>.
|
|
|
+| `inference_config` | no | The default settings defined in the model | (Object) Contains the inference type and its options.
|
|
|
include::common-options.asciidoc[]
|
|
|
|======
|
|
|
|
|
@@ -38,6 +39,119 @@ include::common-options.asciidoc[]
|
|
|
--------------------------------------------------
|
|
|
// NOTCONSOLE
|
|
|
|
|
|
+[discrete]
|
|
|
+[[inference-processor-classification-opt]]
|
|
|
+==== {classification-cap} configuration options
|
|
|
+
|
|
|
+Classification configuration for inference.
|
|
|
+
|
|
|
+`num_top_classes`::
|
|
|
+(Optional, integer)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
|
|
|
+
|
|
|
+`num_top_feature_importance_values`::
|
|
|
+(Optional, integer)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
|
|
|
+
|
|
|
+`results_field`::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
+
|
|
|
+`top_classes_results_field`::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-top-classes-results-field]
|
|
|
+
|
|
|
+`prediction_field_type`::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[inference-processor-fill-mask-opt]]
|
|
|
+==== Fill mask configuration options
|
|
|
+
|
|
|
+`num_top_classes`::
|
|
|
+(Optional, integer)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
|
|
|
+
|
|
|
+`results_field`::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
+
|
|
|
+`tokenization`::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
|
++
|
|
|
+.Properties of tokenization
|
|
|
+[%collapsible%open]
|
|
|
+=====
|
|
|
+`bert`::::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
|
++
|
|
|
+.Properties of bert
|
|
|
+[%collapsible%open]
|
|
|
+=======
|
|
|
+
|
|
|
+`truncate`::::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
|
|
|
+=======
|
|
|
+
|
|
|
+`mpnet`::::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
|
++
|
|
|
+.Properties of mpnet
|
|
|
+[%collapsible%open]
|
|
|
+=======
|
|
|
+
|
|
|
+`truncate`::::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
|
|
|
+=======
|
|
|
+=====
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[inference-processor-ner-opt]]
|
|
|
+==== NER configuration options
|
|
|
+
|
|
|
+`results_field`::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
+
|
|
|
+`tokenization`::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
|
++
|
|
|
+.Properties of tokenization
|
|
|
+[%collapsible%open]
|
|
|
+=====
|
|
|
+`bert`::::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
|
++
|
|
|
+.Properties of bert
|
|
|
+[%collapsible%open]
|
|
|
+=======
|
|
|
+
|
|
|
+`truncate`::::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
|
|
|
+=======
|
|
|
+
|
|
|
+`mpnet`::::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
|
++
|
|
|
+.Properties of mpnet
|
|
|
+[%collapsible%open]
|
|
|
+=======
|
|
|
+
|
|
|
+`truncate`::::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
|
|
|
+=======
|
|
|
+=====
|
|
|
|
|
|
[discrete]
|
|
|
[[inference-processor-regression-opt]]
|
|
@@ -53,41 +167,157 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-
|
|
|
(Optional, integer)
|
|
|
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
|
|
|
|
|
|
-
|
|
|
[discrete]
|
|
|
-[[inference-processor-classification-opt]]
|
|
|
-==== {classification-cap} configuration options
|
|
|
+[[inference-processor-text-classification-opt]]
|
|
|
+==== Text classification configuration options
|
|
|
|
|
|
-Classification configuration for inference.
|
|
|
+`classification_labels`::
|
|
|
+(Optional, string) An array of classification labels.
|
|
|
|
|
|
`num_top_classes`::
|
|
|
(Optional, integer)
|
|
|
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
|
|
|
|
|
|
-`num_top_feature_importance_values`::
|
|
|
-(Optional, integer)
|
|
|
-include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
|
|
|
+`results_field`::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
+
|
|
|
+`tokenization`::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
|
++
|
|
|
+.Properties of tokenization
|
|
|
+[%collapsible%open]
|
|
|
+=====
|
|
|
+`bert`::::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
|
++
|
|
|
+.Properties of bert
|
|
|
+[%collapsible%open]
|
|
|
+=======
|
|
|
+
|
|
|
+`truncate`::::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
|
|
|
+=======
|
|
|
+
|
|
|
+`mpnet`::::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
|
++
|
|
|
+.Properties of mpnet
|
|
|
+[%collapsible%open]
|
|
|
+=======
|
|
|
+
|
|
|
+`truncate`::::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
|
|
|
+=======
|
|
|
+=====
|
|
|
+
|
|
|
+[discrete]
|
|
|
+[[inference-processor-text-embedding-opt]]
|
|
|
+==== Text embedding configuration options
|
|
|
|
|
|
`results_field`::
|
|
|
(Optional, string)
|
|
|
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
|
|
|
-`top_classes_results_field`::
|
|
|
+`tokenization`::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
|
++
|
|
|
+.Properties of tokenization
|
|
|
+[%collapsible%open]
|
|
|
+=====
|
|
|
+`bert`::::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
|
++
|
|
|
+.Properties of bert
|
|
|
+[%collapsible%open]
|
|
|
+=======
|
|
|
+
|
|
|
+`truncate`::::
|
|
|
(Optional, string)
|
|
|
-include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-top-classes-results-field]
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
|
|
|
+=======
|
|
|
+
|
|
|
+`mpnet`::::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
|
++
|
|
|
+.Properties of mpnet
|
|
|
+[%collapsible%open]
|
|
|
+=======
|
|
|
+
|
|
|
+`truncate`::::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
|
|
|
+=======
|
|
|
+=====
|
|
|
|
|
|
-`prediction_field_type`::
|
|
|
+[discrete]
|
|
|
+[[inference-processor-zero-shot-opt]]
|
|
|
+==== Zero shot classification configuration options
|
|
|
+
|
|
|
+`labels`::
|
|
|
+(Optional, array)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification-labels]
|
|
|
+
|
|
|
+`multi_label`::
|
|
|
+(Optional, boolean)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification-multi-label]
|
|
|
+
|
|
|
+`results_field`::
|
|
|
(Optional, string)
|
|
|
-include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
|
|
|
+
|
|
|
+`tokenization`::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
|
|
|
++
|
|
|
+.Properties of tokenization
|
|
|
+[%collapsible%open]
|
|
|
+=====
|
|
|
+`bert`::::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
|
|
|
++
|
|
|
+.Properties of bert
|
|
|
+[%collapsible%open]
|
|
|
+=======
|
|
|
+
|
|
|
+`truncate`::::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
|
|
|
+=======
|
|
|
+
|
|
|
+`mpnet`::::
|
|
|
+(Optional, object)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
|
|
|
++
|
|
|
+.Properties of mpnet
|
|
|
+[%collapsible%open]
|
|
|
+=======
|
|
|
+
|
|
|
+`truncate`::::
|
|
|
+(Optional, string)
|
|
|
+include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert-truncate]
|
|
|
+=======
|
|
|
+=====
|
|
|
|
|
|
[discrete]
|
|
|
[[inference-processor-config-example]]
|
|
|
-==== `inference_config` examples
|
|
|
-
|
|
|
+==== {infer-cap} processor examples
|
|
|
[source,js]
|
|
|
--------------------------------------------------
|
|
|
"inference":{
|
|
|
- "model_id":"my_model_id"
|
|
|
+ "model_id": "my_model_id",
|
|
|
+ "field_map": {
|
|
|
+ "original_fieldname": "expected_fieldname"
|
|
|
+ },
|
|
|
"inference_config": {
|
|
|
"regression": {
|
|
|
"results_field": "my_regression"
|
|
@@ -99,7 +329,8 @@ include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification
|
|
|
|
|
|
This configuration specifies a `regression` inference and the results are
|
|
|
written to the `my_regression` field contained in the `target_field` results
|
|
|
-object.
|
|
|
+object. The `field_map` configuration maps the field `original_fieldname` from
|
|
|
+the source document to the field expected by the model.
|
|
|
|
|
|
|
|
|
[source,js]
|
|
@@ -123,7 +354,7 @@ categories for which the predicted probabilities are reported is 2
|
|
|
classes to the `probabilities` field. Both fields are contained in the
|
|
|
`target_field` results object.
|
|
|
|
|
|
-For an example that uses {nlp} trained models, refer to
|
|
|
+For an example that uses {nlp} trained models, refer to
|
|
|
{ml-docs}/ml-nlp-inference.html[Add NLP inference to ingest pipelines].
|
|
|
|
|
|
[discrete]
|
|
@@ -131,7 +362,7 @@ For an example that uses {nlp} trained models, refer to
|
|
|
==== {feat-imp-cap} object mapping
|
|
|
|
|
|
To get the full benefit of aggregating and searching for
|
|
|
-{ml-docs}/ml-feature-importance.html[{feat-imp}], update your index mapping of
|
|
|
+{ml-docs}/ml-feature-importance.html[{feat-imp}], update your index mapping of
|
|
|
the {feat-imp} result field as you can see below:
|
|
|
|
|
|
[source,js]
|
|
@@ -151,13 +382,13 @@ the {feat-imp} result field as you can see below:
|
|
|
--------------------------------------------------
|
|
|
// NOTCONSOLE
|
|
|
|
|
|
-The mapping field name for {feat-imp} (in the example above, it is
|
|
|
+The mapping field name for {feat-imp} (in the example above, it is
|
|
|
`ml.inference.feature_importance`) is compounded as follows:
|
|
|
|
|
|
`<ml.inference.target_field>`.`<inference.tag>`.`feature_importance`
|
|
|
|
|
|
* `<ml.inference.target_field>`: defaults to `ml.inference`.
|
|
|
-* `<inference.tag>`: if is not provided in the processor definition, then it is
|
|
|
+* `<inference.tag>`: if is not provided in the processor definition, then it is
|
|
|
not part of the field path.
|
|
|
|
|
|
For example, if you provide a tag `foo` in the definition as you can see below:
|
|
@@ -172,7 +403,7 @@ For example, if you provide a tag `foo` in the definition as you can see below:
|
|
|
// NOTCONSOLE
|
|
|
|
|
|
|
|
|
-Then, the {feat-imp} value is written to the
|
|
|
+Then, the {feat-imp} value is written to the
|
|
|
`ml.inference.foo.feature_importance` field.
|
|
|
|
|
|
You can also specify the target field as follows:
|