inference-bucket-aggregation.asciidoc 5.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[search-aggregations-pipeline-inference-bucket-aggregation]]
  4. === {infer-cap} bucket aggregation
  5. ++++
  6. <titleabbrev>{infer-cap} bucket</titleabbrev>
  7. ++++
  8. beta::[]
  9. A parent pipeline aggregation which loads a pre-trained model and performs
  10. {infer} on the collated result fields from the parent bucket aggregation.
  11. To use the {infer} bucket aggregation, you need to have the same security
  12. privileges that are required for using the <<get-trained-models,get trained models API>>.
  13. [[inference-bucket-agg-syntax]]
  14. ==== Syntax
  15. A `inference` aggregation looks like this in isolation:
  16. [source,js]
  17. --------------------------------------------------
  18. {
  19. "inference": {
  20. "model_id": "a_model_for_inference", <1>
  21. "inference_config": { <2>
  22. "regression_config": {
  23. "num_top_feature_importance_values": 2
  24. }
  25. },
  26. "buckets_path": {
  27. "avg_cost": "avg_agg", <3>
  28. "max_cost": "max_agg"
  29. }
  30. }
  31. }
  32. --------------------------------------------------
  33. // NOTCONSOLE
  34. <1> The unique identifier or alias for the trained model.
  35. <2> The optional inference config which overrides the model's default settings
  36. <3> Map the value of `avg_agg` to the model's input field `avg_cost`
  37. [[inference-bucket-params]]
  38. .`inference` Parameters
  39. [options="header"]
  40. |===
  41. |Parameter Name |Description |Required |Default Value
  42. | `model_id` | The ID or alias for the trained model. | Required | -
  43. | `inference_config` | Contains the inference type and its options. There are two types: <<inference-agg-regression-opt,`regression`>> and <<inference-agg-classification-opt,`classification`>> | Optional | -
  44. | `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.
  45. See <<buckets-path-syntax>> for more details | Required | -
  46. |===
  47. ==== Configuration options for {infer} models
  48. The `inference_config` setting is optional and usually isn't required as the
  49. pre-trained models come equipped with sensible defaults. In the context of
  50. aggregations some options can be overridden for each of the two types of model.
  51. [discrete]
  52. [[inference-agg-regression-opt]]
  53. ===== Configuration options for {regression} models
  54. `num_top_feature_importance_values`::
  55. (Optional, integer)
  56. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
  57. [discrete]
  58. [[inference-agg-classification-opt]]
  59. ===== Configuration options for {classification} models
  60. `num_top_classes`::
  61. (Optional, integer)
  62. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
  63. `num_top_feature_importance_values`::
  64. (Optional, integer)
  65. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
  66. `prediction_field_type`::
  67. (Optional, string)
  68. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
  69. [[inference-bucket-agg-example]]
  70. ==== Example
  71. The following snippet aggregates a web log by `client_ip` and extracts a number
  72. of features via metric and bucket sub-aggregations as input to the {infer}
  73. aggregation configured with a model trained to identify suspicious client IPs:
  74. [source,console]
  75. -------------------------------------------------
  76. GET kibana_sample_data_logs/_search
  77. {
  78. "size": 0,
  79. "aggs": {
  80. "client_ip": { <1>
  81. "composite": {
  82. "sources": [
  83. {
  84. "client_ip": {
  85. "terms": {
  86. "field": "clientip"
  87. }
  88. }
  89. }
  90. ]
  91. },
  92. "aggs": { <2>
  93. "url_dc": {
  94. "cardinality": {
  95. "field": "url.keyword"
  96. }
  97. },
  98. "bytes_sum": {
  99. "sum": {
  100. "field": "bytes"
  101. }
  102. },
  103. "geo_src_dc": {
  104. "cardinality": {
  105. "field": "geo.src"
  106. }
  107. },
  108. "geo_dest_dc": {
  109. "cardinality": {
  110. "field": "geo.dest"
  111. }
  112. },
  113. "responses_total": {
  114. "value_count": {
  115. "field": "timestamp"
  116. }
  117. },
  118. "success": {
  119. "filter": {
  120. "term": {
  121. "response": "200"
  122. }
  123. }
  124. },
  125. "error404": {
  126. "filter": {
  127. "term": {
  128. "response": "404"
  129. }
  130. }
  131. },
  132. "error503": {
  133. "filter": {
  134. "term": {
  135. "response": "503"
  136. }
  137. }
  138. },
  139. "malicious_client_ip": { <3>
  140. "inference": {
  141. "model_id": "malicious_clients_model",
  142. "buckets_path": {
  143. "response_count": "responses_total",
  144. "url_dc": "url_dc",
  145. "bytes_sum": "bytes_sum",
  146. "geo_src_dc": "geo_src_dc",
  147. "geo_dest_dc": "geo_dest_dc",
  148. "success": "success._count",
  149. "error404": "error404._count",
  150. "error503": "error503._count"
  151. }
  152. }
  153. }
  154. }
  155. }
  156. }
  157. }
  158. -------------------------------------------------
  159. // TEST[skip:setup kibana sample data]
  160. <1> A composite bucket aggregation that aggregates the data by `client_ip`.
  161. <2> A series of metrics and bucket sub-aggregations.
  162. <3> {infer-cap} bucket aggregation that specifies the trained model and maps the
  163. aggregation names to the model's input fields.