inference-bucket-aggregation.asciidoc 5.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[search-aggregations-pipeline-inference-bucket-aggregation]]
  4. === {infer-cap} Bucket Aggregation
  5. A parent pipeline aggregation which loads a pre-trained model and performs
  6. {infer} on the collated result fields from the parent bucket aggregation.
  7. To use the {infer} bucket aggregation, you need to have the same security
  8. privileges that are required for using the <<get-inference>>.
  9. [[inference-bucket-agg-syntax]]
  10. ==== Syntax
  11. A `inference` aggregation looks like this in isolation:
  12. [source,js]
  13. --------------------------------------------------
  14. {
  15. "inference": {
  16. "model_id": "a_model_for_inference", <1>
  17. "inference_config": { <2>
  18. "regression_config": {
  19. "num_top_feature_importance_values": 2
  20. }
  21. },
  22. "buckets_path": {
  23. "avg_cost": "avg_agg", <3>
  24. "max_cost": "max_agg"
  25. }
  26. }
  27. }
  28. --------------------------------------------------
  29. // NOTCONSOLE
  30. <1> The ID of model to use.
  31. <2> The optional inference config which overrides the model's default settings
  32. <3> Map the value of `avg_agg` to the model's input field `avg_cost`
  33. [[inference-bucket-params]]
  34. .`inference` Parameters
  35. [options="header"]
  36. |===
  37. |Parameter Name |Description |Required |Default Value
  38. | `model_id` | The ID of the model to load and infer against | Required | -
  39. | `inference_config` | Contains the inference type and its options. There are two types: <<inference-agg-regression-opt,`regression`>> and <<inference-agg-classification-opt,`classification`>> | Optional | -
  40. | `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.
  41. See <<buckets-path-syntax>> for more details | Required | -
  42. |===
  43. ==== Configuration options for {infer} models
  44. The `inference_config` setting is optional and usually isn't required as the
  45. pre-trained models come equipped with sensible defaults. In the context of
  46. aggregations some options can overridden for each of the 2 types of model.
  47. [discrete]
  48. [[inference-agg-regression-opt]]
  49. ===== Configuration options for {regression} models
  50. `num_top_feature_importance_values`::
  51. (Optional, integer)
  52. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
  53. [discrete]
  54. [[inference-agg-classification-opt]]
  55. ===== Configuration options for {classification} models
  56. `num_top_classes`::
  57. (Optional, integer)
  58. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
  59. `num_top_feature_importance_values`::
  60. (Optional, integer)
  61. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
  62. `prediction_field_type`::
  63. (Optional, string)
  64. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
  65. [[inference-bucket-agg-example]]
  66. ==== Example
  67. The following snippet aggregates a web log by `client_ip` and extracts a number
  68. of features via metric and bucket sub-aggregations as input to the {infer}
  69. aggregation configured with a model trained to identify suspicious client IPs:
  70. [source,console]
  71. -------------------------------------------------
  72. GET kibana_sample_data_logs/_search
  73. {
  74. "size": 0,
  75. "aggs": {
  76. "client_ip": { <1>
  77. "composite": {
  78. "sources": [
  79. {
  80. "client_ip": {
  81. "terms": {
  82. "field": "clientip"
  83. }
  84. }
  85. }
  86. ]
  87. },
  88. "aggs": { <2>
  89. "url_dc": {
  90. "cardinality": {
  91. "field": "url.keyword"
  92. }
  93. },
  94. "bytes_sum": {
  95. "sum": {
  96. "field": "bytes"
  97. }
  98. },
  99. "geo_src_dc": {
  100. "cardinality": {
  101. "field": "geo.src"
  102. }
  103. },
  104. "geo_dest_dc": {
  105. "cardinality": {
  106. "field": "geo.dest"
  107. }
  108. },
  109. "responses_total": {
  110. "value_count": {
  111. "field": "timestamp"
  112. }
  113. },
  114. "success": {
  115. "filter": {
  116. "term": {
  117. "response": "200"
  118. }
  119. }
  120. },
  121. "error404": {
  122. "filter": {
  123. "term": {
  124. "response": "404"
  125. }
  126. }
  127. },
  128. "error503": {
  129. "filter": {
  130. "term": {
  131. "response": "503"
  132. }
  133. }
  134. },
  135. "malicious_client_ip": { <3>
  136. "inference": {
  137. "model_id": "malicious_clients_model",
  138. "buckets_path": {
  139. "response_count": "responses_total",
  140. "url_dc": "url_dc",
  141. "bytes_sum": "bytes_sum",
  142. "geo_src_dc": "geo_src_dc",
  143. "geo_dest_dc": "geo_dest_dc",
  144. "success": "success._count",
  145. "error404": "error404._count",
  146. "error503": "error503._count"
  147. }
  148. }
  149. }
  150. }
  151. }
  152. }
  153. }
  154. -------------------------------------------------
  155. // TEST[skip:setup kibana sample data]
  156. <1> A composite bucket aggregation that aggregates the data by `client_ip`.
  157. <2> A series of metrics and bucket sub-aggregations.
  158. <3> {infer-cap} bucket aggregation that contains the model ID and maps the
  159. aggregation names to the model's input fields.