123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184 |
- [role="xpack"]
- [[search-aggregations-pipeline-inference-bucket-aggregation]]
- === {infer-cap} bucket aggregation
- ++++
- <titleabbrev>{infer-cap} bucket</titleabbrev>
- ++++
- A parent pipeline aggregation which loads a pre-trained model and performs
- {infer} on the collated result fields from the parent bucket aggregation.
- To use the {infer} bucket aggregation, you need to have the same security
- privileges that are required for using the
- <<get-trained-models,get trained models API>>.
- [[inference-bucket-agg-syntax]]
- ==== Syntax
- A `inference` aggregation looks like this in isolation:
- [source,js]
- --------------------------------------------------
- {
- "inference": {
- "model_id": "a_model_for_inference", <1>
- "inference_config": { <2>
- "regression_config": {
- "num_top_feature_importance_values": 2
- }
- },
- "buckets_path": {
- "avg_cost": "avg_agg", <3>
- "max_cost": "max_agg"
- }
- }
- }
- --------------------------------------------------
- // NOTCONSOLE
- <1> The unique identifier or alias for the trained model.
- <2> The optional inference config which overrides the model's default settings
- <3> Map the value of `avg_agg` to the model's input field `avg_cost`
- [[inference-bucket-params]]
- .`inference` Parameters
- [options="header"]
- |===
- |Parameter Name |Description |Required |Default Value
- | `model_id` | The ID or alias for the trained model. | Required | -
- | `inference_config` | Contains the inference type and its options. There are two types: <<inference-agg-regression-opt,`regression`>> and <<inference-agg-classification-opt,`classification`>> | Optional | -
- | `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.
- See <<buckets-path-syntax>> for more details | Required | -
- |===
- ==== Configuration options for {infer} models
- The `inference_config` setting is optional and usually isn't required as the
- pre-trained models come equipped with sensible defaults. In the context of
- aggregations some options can be overridden for each of the two types of model.
- [discrete]
- [[inference-agg-regression-opt]]
- ===== Configuration options for {regression} models
- `num_top_feature_importance_values`::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
- [discrete]
- [[inference-agg-classification-opt]]
- ===== Configuration options for {classification} models
- `num_top_classes`::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
- `num_top_feature_importance_values`::
- (Optional, integer)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
- `prediction_field_type`::
- (Optional, string)
- include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
- [[inference-bucket-agg-example]]
- ==== Example
- The following snippet aggregates a web log by `client_ip` and extracts a number
- of features via metric and bucket sub-aggregations as input to the {infer}
- aggregation configured with a model trained to identify suspicious client IPs:
- [source,console]
- -------------------------------------------------
- GET kibana_sample_data_logs/_search
- {
- "size": 0,
- "aggs": {
- "client_ip": { <1>
- "composite": {
- "sources": [
- {
- "client_ip": {
- "terms": {
- "field": "clientip"
- }
- }
- }
- ]
- },
- "aggs": { <2>
- "url_dc": {
- "cardinality": {
- "field": "url.keyword"
- }
- },
- "bytes_sum": {
- "sum": {
- "field": "bytes"
- }
- },
- "geo_src_dc": {
- "cardinality": {
- "field": "geo.src"
- }
- },
- "geo_dest_dc": {
- "cardinality": {
- "field": "geo.dest"
- }
- },
- "responses_total": {
- "value_count": {
- "field": "timestamp"
- }
- },
- "success": {
- "filter": {
- "term": {
- "response": "200"
- }
- }
- },
- "error404": {
- "filter": {
- "term": {
- "response": "404"
- }
- }
- },
- "error503": {
- "filter": {
- "term": {
- "response": "503"
- }
- }
- },
- "malicious_client_ip": { <3>
- "inference": {
- "model_id": "malicious_clients_model",
- "buckets_path": {
- "response_count": "responses_total",
- "url_dc": "url_dc",
- "bytes_sum": "bytes_sum",
- "geo_src_dc": "geo_src_dc",
- "geo_dest_dc": "geo_dest_dc",
- "success": "success._count",
- "error404": "error404._count",
- "error503": "error503._count"
- }
- }
- }
- }
- }
- }
- }
- -------------------------------------------------
- // TEST[skip:setup kibana sample data]
- <1> A composite bucket aggregation that aggregates the data by `client_ip`.
- <2> A series of metrics and bucket sub-aggregations.
- <3> {infer-cap} bucket aggregation that specifies the trained model and maps the
- aggregation names to the model's input fields.
|