123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495 |
- [role="xpack"]
- [[inference-processor]]
- === {infer-cap} processor
- ++++
- <titleabbrev>{infer-cap}</titleabbrev>
- ++++
- Uses a pre-trained {dfanalytics} model or a model deployed for natural
- language processing tasks to infer against the data that is being
- ingested in the pipeline.
- [[inference-options]]
- .{infer-cap} Options
- [options="header"]
- |======
- | Name | Required | Default | Description
- | `model_id` . | yes | - | (String) The ID or alias for the trained model, or the ID of the deployment.
- | `target_field` | no | `ml.inference.<processor_tag>` | (String) Field added to incoming documents to contain results objects.
- | `field_map` | no | If defined the model's default field map | (Object) Maps the document field names to the known field names of the model. This mapping takes precedence over any default mappings provided in the model configuration.
- | `inference_config` | no | The default settings defined in the model | (Object) Contains the inference type and its options.
- include::common-options.asciidoc[]
- |======
- [source,js]
- --------------------------------------------------
- {
- "inference": {
- "model_id": "model_deployment_for_inference",
- "target_field": "FlightDelayMin_prediction_infer",
- "field_map": {
- "your_field": "my_field"
- },
- "inference_config": { "regression": {} }
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- [discrete]
- [[inference-processor-classification-opt]]
- ==== {classification-cap} configuration options
- Classification configuration for inference.
- `num_top_classes`::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
- `num_top_feature_importance_values`::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
- `results_field`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
- `top_classes_results_field`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-top-classes-results-field]
- `prediction_field_type`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
- [discrete]
- [[inference-processor-fill-mask-opt]]
- ==== Fill mask configuration options
- `num_top_classes`::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
- `results_field`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
- `tokenization`::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
- +
- .Properties of tokenization
- [%collapsible%open]
- =====
- `bert`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
- +
- .Properties of bert
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- `roberta`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
- +
- .Properties of roberta
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- `mpnet`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
- +
- .Properties of mpnet
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- =====
- [discrete]
- [[inference-processor-ner-opt]]
- ==== NER configuration options
- `results_field`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
- `tokenization`::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
- +
- .Properties of tokenization
- [%collapsible%open]
- =====
- `bert`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
- +
- .Properties of bert
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- `roberta`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
- +
- .Properties of roberta
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- `mpnet`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
- +
- .Properties of mpnet
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- =====
- [discrete]
- [[inference-processor-regression-opt]]
- ==== {regression-cap} configuration options
- Regression configuration for inference.
- `results_field`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
- `num_top_feature_importance_values`::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
- [discrete]
- [[inference-processor-text-classification-opt]]
- ==== Text classification configuration options
- `classification_labels`::
- (Optional, string) An array of classification labels.
- `num_top_classes`::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
- `results_field`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
- `tokenization`::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
- +
- .Properties of tokenization
- [%collapsible%open]
- =====
- `bert`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
- +
- .Properties of bert
- [%collapsible%open]
- =======
- `span`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- `roberta`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
- +
- .Properties of roberta
- [%collapsible%open]
- =======
- `span`::::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-span]
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- `mpnet`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
- +
- .Properties of mpnet
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- =====
- [discrete]
- [[inference-processor-text-embedding-opt]]
- ==== Text embedding configuration options
- `results_field`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
- `tokenization`::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
- +
- .Properties of tokenization
- [%collapsible%open]
- =====
- `bert`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
- +
- .Properties of bert
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- `roberta`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
- +
- .Properties of roberta
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- `mpnet`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
- +
- .Properties of mpnet
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- =====
- [discrete]
- [[inference-processor-zero-shot-opt]]
- ==== Zero shot classification configuration options
- `labels`::
- (Optional, array)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification-labels]
- `multi_label`::
- (Optional, boolean)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-zero-shot-classification-multi-label]
- `results_field`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field-processor]
- `tokenization`::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization]
- +
- .Properties of tokenization
- [%collapsible%open]
- =====
- `bert`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-bert]
- +
- .Properties of bert
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- `roberta`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-roberta]
- +
- .Properties of roberta
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- `mpnet`::::
- (Optional, object)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-mpnet]
- +
- .Properties of mpnet
- [%collapsible%open]
- =======
- `truncate`::::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-nlp-tokenization-truncate]
- =======
- =====
- [discrete]
- [[inference-processor-config-example]]
- ==== {infer-cap} processor examples
- [source,js]
- --------------------------------------------------
- "inference":{
- "model_id": "my_model_id",
- "field_map": {
- "original_fieldname": "expected_fieldname"
- },
- "inference_config": {
- "regression": {
- "results_field": "my_regression"
- }
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- This configuration specifies a `regression` inference and the results are
- written to the `my_regression` field contained in the `target_field` results
- object. The `field_map` configuration maps the field `original_fieldname` from
- the source document to the field expected by the model.
- [source,js]
- --------------------------------------------------
- "inference":{
- "model_id":"my_model_id"
- "inference_config": {
- "classification": {
- "num_top_classes": 2,
- "results_field": "prediction",
- "top_classes_results_field": "probabilities"
- }
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- This configuration specifies a `classification` inference. The number of
- categories for which the predicted probabilities are reported is 2
- (`num_top_classes`). The result is written to the `prediction` field and the top
- classes to the `probabilities` field. Both fields are contained in the
- `target_field` results object.
- For an example that uses {nlp} trained models, refer to
- {ml-docs}/ml-nlp-inference.html[Add NLP inference to ingest pipelines].
- [discrete]
- [[inference-processor-feature-importance]]
- ==== {feat-imp-cap} object mapping
- To get the full benefit of aggregating and searching for
- {ml-docs}/ml-feature-importance.html[{feat-imp}], update your index mapping of
- the {feat-imp} result field as you can see below:
- [source,js]
- --------------------------------------------------
- "ml.inference.feature_importance": {
- "type": "nested",
- "dynamic": true,
- "properties": {
- "feature_name": {
- "type": "keyword"
- },
- "importance": {
- "type": "double"
- }
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- The mapping field name for {feat-imp} (in the example above, it is
- `ml.inference.feature_importance`) is compounded as follows:
- `<ml.inference.target_field>`.`<inference.tag>`.`feature_importance`
- * `<ml.inference.target_field>`: defaults to `ml.inference`.
- * `<inference.tag>`: if is not provided in the processor definition, then it is
- not part of the field path.
- For example, if you provide a tag `foo` in the definition as you can see below:
- [source,js]
- --------------------------------------------------
- {
- "tag": "foo",
- ...
- }
- --------------------------------------------------
- // NOTCONSOLE
- Then, the {feat-imp} value is written to the
- `ml.inference.foo.feature_importance` field.
- You can also specify the target field as follows:
- [source,js]
- --------------------------------------------------
- {
- "tag": "foo",
- "target_field": "my_field"
- }
- --------------------------------------------------
- // NOTCONSOLE
- In this case, {feat-imp} is exposed in the
- `my_field.foo.feature_importance` field.
|