inference-bucket-aggregation.asciidoc 5.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184
  1. [role="xpack"]
  2. [[search-aggregations-pipeline-inference-bucket-aggregation]]
  3. === {infer-cap} bucket aggregation
  4. ++++
  5. <titleabbrev>{infer-cap} bucket</titleabbrev>
  6. ++++
  7. A parent pipeline aggregation which loads a pre-trained model and performs
  8. {infer} on the collated result fields from the parent bucket aggregation.
  9. To use the {infer} bucket aggregation, you need to have the same security
  10. privileges that are required for using the
  11. <<get-trained-models,get trained models API>>.
  12. [[inference-bucket-agg-syntax]]
  13. ==== Syntax
  14. A `inference` aggregation looks like this in isolation:
  15. [source,js]
  16. --------------------------------------------------
  17. {
  18. "inference": {
  19. "model_id": "a_model_for_inference", <1>
  20. "inference_config": { <2>
  21. "regression_config": {
  22. "num_top_feature_importance_values": 2
  23. }
  24. },
  25. "buckets_path": {
  26. "avg_cost": "avg_agg", <3>
  27. "max_cost": "max_agg"
  28. }
  29. }
  30. }
  31. --------------------------------------------------
  32. // NOTCONSOLE
  33. <1> The unique identifier or alias for the trained model.
  34. <2> The optional inference config which overrides the model's default settings
  35. <3> Map the value of `avg_agg` to the model's input field `avg_cost`
  36. [[inference-bucket-params]]
  37. .`inference` Parameters
  38. [options="header"]
  39. |===
  40. |Parameter Name |Description |Required |Default Value
  41. | `model_id` | The ID or alias for the trained model. | Required | -
  42. | `inference_config` | Contains the inference type and its options. There are two types: <<inference-agg-regression-opt,`regression`>> and <<inference-agg-classification-opt,`classification`>> | Optional | -
  43. | `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.
  44. See <<buckets-path-syntax>> for more details | Required | -
  45. |===
  46. ==== Configuration options for {infer} models
  47. The `inference_config` setting is optional and usually isn't required as the
  48. pre-trained models come equipped with sensible defaults. In the context of
  49. aggregations some options can be overridden for each of the two types of model.
  50. [discrete]
  51. [[inference-agg-regression-opt]]
  52. ===== Configuration options for {regression} models
  53. `num_top_feature_importance_values`::
  54. (Optional, integer)
  55. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
  56. [discrete]
  57. [[inference-agg-classification-opt]]
  58. ===== Configuration options for {classification} models
  59. `num_top_classes`::
  60. (Optional, integer)
  61. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
  62. `num_top_feature_importance_values`::
  63. (Optional, integer)
  64. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
  65. `prediction_field_type`::
  66. (Optional, string)
  67. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
  68. [[inference-bucket-agg-example]]
  69. ==== Example
  70. The following snippet aggregates a web log by `client_ip` and extracts a number
  71. of features via metric and bucket sub-aggregations as input to the {infer}
  72. aggregation configured with a model trained to identify suspicious client IPs:
  73. [source,console]
  74. -------------------------------------------------
  75. GET kibana_sample_data_logs/_search
  76. {
  77. "size": 0,
  78. "aggs": {
  79. "client_ip": { <1>
  80. "composite": {
  81. "sources": [
  82. {
  83. "client_ip": {
  84. "terms": {
  85. "field": "clientip"
  86. }
  87. }
  88. }
  89. ]
  90. },
  91. "aggs": { <2>
  92. "url_dc": {
  93. "cardinality": {
  94. "field": "url.keyword"
  95. }
  96. },
  97. "bytes_sum": {
  98. "sum": {
  99. "field": "bytes"
  100. }
  101. },
  102. "geo_src_dc": {
  103. "cardinality": {
  104. "field": "geo.src"
  105. }
  106. },
  107. "geo_dest_dc": {
  108. "cardinality": {
  109. "field": "geo.dest"
  110. }
  111. },
  112. "responses_total": {
  113. "value_count": {
  114. "field": "timestamp"
  115. }
  116. },
  117. "success": {
  118. "filter": {
  119. "term": {
  120. "response": "200"
  121. }
  122. }
  123. },
  124. "error404": {
  125. "filter": {
  126. "term": {
  127. "response": "404"
  128. }
  129. }
  130. },
  131. "error503": {
  132. "filter": {
  133. "term": {
  134. "response": "503"
  135. }
  136. }
  137. },
  138. "malicious_client_ip": { <3>
  139. "inference": {
  140. "model_id": "malicious_clients_model",
  141. "buckets_path": {
  142. "response_count": "responses_total",
  143. "url_dc": "url_dc",
  144. "bytes_sum": "bytes_sum",
  145. "geo_src_dc": "geo_src_dc",
  146. "geo_dest_dc": "geo_dest_dc",
  147. "success": "success._count",
  148. "error404": "error404._count",
  149. "error503": "error503._count"
  150. }
  151. }
  152. }
  153. }
  154. }
  155. }
  156. }
  157. -------------------------------------------------
  158. // TEST[skip:setup kibana sample data]
  159. <1> A composite bucket aggregation that aggregates the data by `client_ip`.
  160. <2> A series of metrics and bucket sub-aggregations.
  161. <3> {infer-cap} bucket aggregation that specifies the trained model and maps the
  162. aggregation names to the model's input fields.