get-trained-models-stats.asciidoc 8.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337
  1. [role="xpack"]
  2. [[get-trained-models-stats]]
  3. = Get trained models statistics API
  4. [subs="attributes"]
  5. ++++
  6. <titleabbrev>Get trained models stats</titleabbrev>
  7. ++++
  8. Retrieves usage information for trained models.
  9. [[ml-get-trained-models-stats-request]]
  10. == {api-request-title}
  11. `GET _ml/trained_models/_stats` +
  12. `GET _ml/trained_models/_all/_stats` +
  13. `GET _ml/trained_models/<model_id>/_stats` +
  14. `GET _ml/trained_models/<model_id>,<model_id_2>/_stats` +
  15. `GET _ml/trained_models/<model_id_pattern*>,<model_id_2>/_stats`
  16. [[ml-get-trained-models-stats-prereq]]
  17. == {api-prereq-title}
  18. Requires the `monitor_ml` cluster privilege. This privilege is included in the
  19. `machine_learning_user` built-in role.
  20. [[ml-get-trained-models-stats-desc]]
  21. == {api-description-title}
  22. You can get usage information for multiple trained models in a single API
  23. request by using a comma-separated list of model IDs or a wildcard expression.
  24. [[ml-get-trained-models-stats-path-params]]
  25. == {api-path-parms-title}
  26. `<model_id>`::
  27. (Optional, string)
  28. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id-or-alias]
  29. [[ml-get-trained-models-stats-query-params]]
  30. == {api-query-parms-title}
  31. `allow_no_match`::
  32. (Optional, Boolean)
  33. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=allow-no-match-models]
  34. `from`::
  35. (Optional, integer)
  36. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=from-models]
  37. `size`::
  38. (Optional, integer)
  39. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=size-models]
  40. [role="child_attributes"]
  41. [[ml-get-trained-models-stats-results]]
  42. == {api-response-body-title}
  43. `count`::
  44. (integer)
  45. The total number of trained model statistics that matched the requested ID
  46. patterns. Could be higher than the number of items in the `trained_model_stats`
  47. array as the size of the array is restricted by the supplied `size` parameter.
  48. `trained_model_stats`::
  49. (array)
  50. An array of trained model statistics, which are sorted by the `model_id` value
  51. in ascending order.
  52. +
  53. .Properties of trained model stats
  54. [%collapsible%open]
  55. ====
  56. `deployment_stats`:::
  57. (list)
  58. A collection of deployment stats if one of the provided `model_id` values
  59. is deployed
  60. +
  61. .Properties of deployment stats
  62. [%collapsible%open]
  63. =====
  64. `allocation_status`:::
  65. (object)
  66. The detailed allocation status given the deployment configuration.
  67. +
  68. .Properties of allocation stats
  69. [%collapsible%open]
  70. ======
  71. `allocation_count`:::
  72. (integer)
  73. The current number of nodes where the model is allocated.
  74. `state`:::
  75. (string)
  76. The detailed allocation state related to the nodes.
  77. +
  78. --
  79. * `starting`: Allocations are being attempted but no node currently has the model allocated.
  80. * `started`: At least one node has the model allocated.
  81. * `fully_allocated`: The deployment is fully allocated and satisfies the `target_allocation_count`.
  82. --
  83. `target_allocation_count`:::
  84. (integer)
  85. The desired number of nodes for model allocation.
  86. ======
  87. `model_id`:::
  88. (string)
  89. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]
  90. `model_size`:::
  91. (<<byte-units,byte value>>)
  92. The size of the loaded model in bytes.
  93. `nodes`:::
  94. (array of objects)
  95. The deployment stats for each node that currently has the model allocated.
  96. +
  97. .Properties of node stats
  98. [%collapsible%open]
  99. ======
  100. `average_inference_time_ms`:::
  101. (double)
  102. The average time for each inference call to complete on this node.
  103. `inference_count`:::
  104. (integer)
  105. The total number of inference calls made against this node for this model.
  106. `last_access`:::
  107. (long)
  108. The epoch time stamp of the last inference call for the model on this node.
  109. `node`:::
  110. (object)
  111. Information pertaining to the node.
  112. +
  113. .Properties of node
  114. [%collapsible%open]
  115. ========
  116. `attributes`:::
  117. (object)
  118. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-attributes]
  119. `ephemeral_id`:::
  120. (string)
  121. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-ephemeral-id]
  122. `id`:::
  123. (string)
  124. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-id]
  125. `name`:::
  126. (string) The node name.
  127. `transport_address`:::
  128. (string)
  129. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=node-transport-address]
  130. ========
  131. `reason`:::
  132. (string)
  133. The reason for the current state. Usually only populated when the `routing_state` is `failed`.
  134. `routing_state`:::
  135. (object)
  136. The current routing state and reason for the current routing state for this allocation.
  137. +
  138. --
  139. * `starting`: The model is attempting to allocate on this model, inference calls are not yet accepted.
  140. * `started`: The model is allocated and ready to accept inference requests.
  141. * `stopping`: The model is being deallocated from this node.
  142. * `stopped`: The model is fully deallocated from this node.
  143. * `failed`: The allocation attempt failed, see `reason` field for the potential cause.
  144. --
  145. `start_time`:::
  146. (long)
  147. The epoch timestamp when the allocation started.
  148. ======
  149. `start_time`:::
  150. (long)
  151. The epoch timestamp when the deployment started.
  152. `state`:::
  153. (string)
  154. The overall state of the deployment. The values may be:
  155. +
  156. --
  157. * `starting`: The deployment has recently started but is not yet usable as the model is not allocated on any nodes.
  158. * `started`: The deployment is usable as at least one node has the model allocated.
  159. * `stopping`: The deployment is preparing to stop and deallocate the model from the relevant nodes.
  160. --
  161. =====
  162. `inference_stats`:::
  163. (object)
  164. A collection of inference stats fields.
  165. +
  166. .Properties of inference stats
  167. [%collapsible%open]
  168. =====
  169. `missing_all_fields_count`:::
  170. (integer)
  171. The number of inference calls where all the training features for the model
  172. were missing.
  173. `inference_count`:::
  174. (integer)
  175. The total number of times the model has been called for inference.
  176. This is across all inference contexts, including all pipelines.
  177. `cache_miss_count`:::
  178. (integer)
  179. The number of times the model was loaded for inference and was not retrieved
  180. from the cache. If this number is close to the `inference_count`, then the cache
  181. is not being appropriately used. This can be solved by increasing the cache size
  182. or its time-to-live (TTL). See <<general-ml-settings>> for the appropriate
  183. settings.
  184. `failure_count`:::
  185. (integer)
  186. The number of failures when using the model for inference.
  187. `timestamp`:::
  188. (<<time-units,time units>>)
  189. The time when the statistics were last updated.
  190. =====
  191. `ingest`:::
  192. (object)
  193. A collection of ingest stats for the model across all nodes. The values are
  194. summations of the individual node statistics. The format matches the `ingest`
  195. section in <<cluster-nodes-stats>>.
  196. `model_id`:::
  197. (string)
  198. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]
  199. `pipeline_count`:::
  200. (integer)
  201. The number of ingest pipelines that currently refer to the model.
  202. ====
  203. [[ml-get-trained-models-stats-response-codes]]
  204. == {api-response-codes-title}
  205. `404` (Missing resources)::
  206. If `allow_no_match` is `false`, this code indicates that there are no
  207. resources that match the request or only partial matches for the request.
  208. [[ml-get-trained-models-stats-example]]
  209. == {api-examples-title}
  210. The following example gets usage information for all the trained models:
  211. [source,console]
  212. --------------------------------------------------
  213. GET _ml/trained_models/_stats
  214. --------------------------------------------------
  215. // TEST[skip:TBD]
  216. The API returns the following results:
  217. [source,console-result]
  218. ----
  219. {
  220. "count": 2,
  221. "trained_model_stats": [
  222. {
  223. "model_id": "flight-delay-prediction-1574775339910",
  224. "pipeline_count": 0,
  225. "inference_stats": {
  226. "failure_count": 0,
  227. "inference_count": 4,
  228. "cache_miss_count": 3,
  229. "missing_all_fields_count": 0,
  230. "timestamp": 1592399986979
  231. }
  232. },
  233. {
  234. "model_id": "regression-job-one-1574775307356",
  235. "pipeline_count": 1,
  236. "inference_stats": {
  237. "failure_count": 0,
  238. "inference_count": 178,
  239. "cache_miss_count": 3,
  240. "missing_all_fields_count": 0,
  241. "timestamp": 1592399986979
  242. },
  243. "ingest": {
  244. "total": {
  245. "count": 178,
  246. "time_in_millis": 8,
  247. "current": 0,
  248. "failed": 0
  249. },
  250. "pipelines": {
  251. "flight-delay": {
  252. "count": 178,
  253. "time_in_millis": 8,
  254. "current": 0,
  255. "failed": 0,
  256. "processors": [
  257. {
  258. "inference": {
  259. "type": "inference",
  260. "stats": {
  261. "count": 178,
  262. "time_in_millis": 7,
  263. "current": 0,
  264. "failed": 0
  265. }
  266. }
  267. }
  268. ]
  269. }
  270. }
  271. }
  272. }
  273. ]
  274. }
  275. ----
  276. // NOTCONSOLE