inference-bucket-aggregation.asciidoc 5.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[search-aggregations-pipeline-inference-bucket-aggregation]]
  4. === {infer-cap} Bucket Aggregation
  5. experimental::[]
  6. A parent pipeline aggregation which loads a pre-trained model and performs
  7. {infer} on the collated result fields from the parent bucket aggregation.
  8. To use the {infer} bucket aggregation, you need to have the same security
  9. privileges that are required for using the <<get-inference>>.
  10. [[inference-bucket-agg-syntax]]
  11. ==== Syntax
  12. A `inference` aggregation looks like this in isolation:
  13. [source,js]
  14. --------------------------------------------------
  15. {
  16. "inference": {
  17. "model_id": "a_model_for_inference", <1>
  18. "inference_config": { <2>
  19. "regression_config": {
  20. "num_top_feature_importance_values": 2
  21. }
  22. },
  23. "buckets_path": {
  24. "avg_cost": "avg_agg", <3>
  25. "max_cost": "max_agg"
  26. }
  27. }
  28. }
  29. --------------------------------------------------
  30. // NOTCONSOLE
  31. <1> The ID of model to use.
  32. <2> The optional inference config which overrides the model's default settings
  33. <3> Map the value of `avg_agg` to the model's input field `avg_cost`
  34. [[inference-bucket-params]]
  35. .`inference` Parameters
  36. [options="header"]
  37. |===
  38. |Parameter Name |Description |Required |Default Value
  39. | `model_id` | The ID of the model to load and infer against | Required | -
  40. | `inference_config` | Contains the inference type and its options. There are two types: <<inference-agg-regression-opt,`regression`>> and <<inference-agg-classification-opt,`classification`>> | Optional | -
  41. | `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.
  42. See <<buckets-path-syntax>> for more details | Required | -
  43. |===
  44. ==== Configuration options for {infer} models
  45. The `inference_config` setting is optional and usually isn't required as the
  46. pre-trained models come equipped with sensible defaults. In the context of
  47. aggregations some options can overridden for each of the 2 types of model.
  48. [discrete]
  49. [[inference-agg-regression-opt]]
  50. ===== Configuration options for {regression} models
  51. `num_top_feature_importance_values`::
  52. (Optional, integer)
  53. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
  54. [discrete]
  55. [[inference-agg-classification-opt]]
  56. ===== Configuration options for {classification} models
  57. `num_top_classes`::
  58. (Optional, integer)
  59. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
  60. `num_top_feature_importance_values`::
  61. (Optional, integer)
  62. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
  63. `prediction_field_type`::
  64. (Optional, string)
  65. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
  66. [[inference-bucket-agg-example]]
  67. ==== Example
  68. The following snippet aggregates a web log by `client_ip` and extracts a number
  69. of features via metric and bucket sub-aggregations as input to the {infer}
  70. aggregation configured with a model trained to identify suspicious client IPs:
  71. [source,console]
  72. -------------------------------------------------
  73. GET kibana_sample_data_logs/_search
  74. {
  75. "size": 0,
  76. "aggs": {
  77. "client_ip": { <1>
  78. "composite": {
  79. "sources": [
  80. {
  81. "client_ip": {
  82. "terms": {
  83. "field": "clientip"
  84. }
  85. }
  86. }
  87. ]
  88. },
  89. "aggs": { <2>
  90. "url_dc": {
  91. "cardinality": {
  92. "field": "url.keyword"
  93. }
  94. },
  95. "bytes_sum": {
  96. "sum": {
  97. "field": "bytes"
  98. }
  99. },
  100. "geo_src_dc": {
  101. "cardinality": {
  102. "field": "geo.src"
  103. }
  104. },
  105. "geo_dest_dc": {
  106. "cardinality": {
  107. "field": "geo.dest"
  108. }
  109. },
  110. "responses_total": {
  111. "value_count": {
  112. "field": "timestamp"
  113. }
  114. },
  115. "success": {
  116. "filter": {
  117. "term": {
  118. "response": "200"
  119. }
  120. }
  121. },
  122. "error404": {
  123. "filter": {
  124. "term": {
  125. "response": "404"
  126. }
  127. }
  128. },
  129. "error503": {
  130. "filter": {
  131. "term": {
  132. "response": "503"
  133. }
  134. }
  135. },
  136. "malicious_client_ip": { <3>
  137. "inference": {
  138. "model_id": "malicious_clients_model",
  139. "buckets_path": {
  140. "response_count": "responses_total",
  141. "url_dc": "url_dc",
  142. "bytes_sum": "bytes_sum",
  143. "geo_src_dc": "geo_src_dc",
  144. "geo_dest_dc": "geo_dest_dc",
  145. "success": "success._count",
  146. "error404": "error404._count",
  147. "error503": "error503._count"
  148. }
  149. }
  150. }
  151. }
  152. }
  153. }
  154. }
  155. -------------------------------------------------
  156. // TEST[skip:setup kibana sample data]
  157. <1> A composite bucket aggregation that aggregates the data by `client_ip`.
  158. <2> A series of metrics and bucket sub-aggregations.
  159. <3> {infer-cap} bucket aggregation that contains the model ID and maps the
  160. aggregation names to the model's input fields.