put-trained-models.asciidoc 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[put-trained-models]]
  4. = Create trained models API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Create trained models</titleabbrev>
  8. ++++
  9. Creates a trained model.
  10. WARNING: Models created in version 7.8.0 are not backwards compatible
  11. with older node versions. If in a mixed cluster environment,
  12. all nodes must be at least 7.8.0 to use a model stored by
  13. a 7.8.0 node.
  14. [[ml-put-trained-models-request]]
  15. == {api-request-title}
  16. `PUT _ml/trained_models/<model_id>`
  17. [[ml-put-trained-models-prereq]]
  18. == {api-prereq-title}
  19. Requires the `manage_ml` cluster privilege. This privilege is included in the
  20. `machine_learning_admin` built-in role.
  21. [[ml-put-trained-models-desc]]
  22. == {api-description-title}
  23. The create trained model API enables you to supply a trained model that is not
  24. created by {dfanalytics}.
  25. [[ml-put-trained-models-path-params]]
  26. == {api-path-parms-title}
  27. `<model_id>`::
  28. (Required, string)
  29. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]
  30. [role="child_attributes"]
  31. [[ml-put-trained-models-request-body]]
  32. == {api-request-body-title}
  33. `compressed_definition`::
  34. (Required, string)
  35. The compressed (GZipped and Base64 encoded) {infer} definition of the model.
  36. If `compressed_definition` is specified, then `definition` cannot be specified.
  37. //Begin definition
  38. `definition`::
  39. (Required, object)
  40. The {infer} definition for the model. If `definition` is specified, then
  41. `compressed_definition` cannot be specified.
  42. +
  43. .Properties of `definition`
  44. [%collapsible%open]
  45. ====
  46. //Begin preprocessors
  47. `preprocessors`::
  48. (Optional, object)
  49. Collection of preprocessors. See <<ml-put-trained-models-preprocessor-example>>.
  50. +
  51. .Properties of `preprocessors`
  52. [%collapsible%open]
  53. =====
  54. //Begin frequency encoding
  55. `frequency_encoding`::
  56. (Required, object)
  57. Defines a frequency encoding for a field.
  58. +
  59. .Properties of `frequency_encoding`
  60. [%collapsible%open]
  61. ======
  62. `feature_name`::
  63. (Required, string)
  64. The name of the resulting feature.
  65. `field`::
  66. (Required, string)
  67. The field name to encode.
  68. `frequency_map`::
  69. (Required, object map of string:double)
  70. Object that maps the field value to the frequency encoded value.
  71. `custom`::
  72. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=custom-preprocessor]
  73. ======
  74. //End frequency encoding
  75. //Begin one hot encoding
  76. `one_hot_encoding`::
  77. (Required, object)
  78. Defines a one hot encoding map for a field.
  79. +
  80. .Properties of `one_hot_encoding`
  81. [%collapsible%open]
  82. ======
  83. `field`::
  84. (Required, string)
  85. The field name to encode.
  86. `hot_map`::
  87. (Required, object map of strings)
  88. String map of "field_value: one_hot_column_name".
  89. `custom`::
  90. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=custom-preprocessor]
  91. ======
  92. //End one hot encoding
  93. //Begin target mean encoding
  94. `target_mean_encoding`::
  95. (Required, object)
  96. Defines a target mean encoding for a field.
  97. +
  98. .Properties of `target_mean_encoding`
  99. [%collapsible%open]
  100. ======
  101. `default_value`:::
  102. (Required, double)
  103. The feature value if the field value is not in the `target_map`.
  104. `feature_name`:::
  105. (Required, string)
  106. The name of the resulting feature.
  107. `field`:::
  108. (Required, string)
  109. The field name to encode.
  110. `target_map`:::
  111. (Required, object map of string:double)
  112. Object that maps the field value to the target mean value.
  113. `custom`::
  114. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=custom-preprocessor]
  115. ======
  116. //End target mean encoding
  117. =====
  118. //End preprocessors
  119. //Begin trained model
  120. `trained_model`::
  121. (Required, object)
  122. The definition of the trained model.
  123. +
  124. .Properties of `trained_model`
  125. [%collapsible%open]
  126. =====
  127. //Begin tree
  128. `tree`::
  129. (Required, object)
  130. The definition for a binary decision tree.
  131. +
  132. .Properties of `tree`
  133. [%collapsible%open]
  134. ======
  135. `classification_labels`:::
  136. (Optional, string) An array of classification labels (used for
  137. `classification`).
  138. `feature_names`:::
  139. (Required, string)
  140. Features expected by the tree, in their expected order.
  141. `target_type`:::
  142. (Required, string)
  143. String indicating the model target type; `regression` or `classification`.
  144. `tree_structure`:::
  145. (Required, object)
  146. An array of `tree_node` objects. The nodes must be in ordinal order by their
  147. `tree_node.node_index` value.
  148. ======
  149. //End tree
  150. //Begin tree node
  151. `tree_node`::
  152. (Required, object)
  153. The definition of a node in a tree.
  154. +
  155. --
  156. There are two major types of nodes: leaf nodes and not-leaf nodes.
  157. * Leaf nodes only need `node_index` and `leaf_value` defined.
  158. * All other nodes need `split_feature`, `left_child`, `right_child`,
  159. `threshold`, `decision_type`, and `default_left` defined.
  160. --
  161. +
  162. .Properties of `tree_node`
  163. [%collapsible%open]
  164. ======
  165. `decision_type`::
  166. (Optional, string)
  167. Indicates the positive value (in other words, when to choose the left node)
  168. decision type. Supported `lt`, `lte`, `gt`, `gte`. Defaults to `lte`.
  169. `default_left`::
  170. (Optional, Boolean)
  171. Indicates whether to default to the left when the feature is missing. Defaults
  172. to `true`.
  173. `leaf_value`::
  174. (Optional, double)
  175. The leaf value of the of the node, if the value is a leaf (in other words, no
  176. children).
  177. `left_child`::
  178. (Optional, integer)
  179. The index of the left child.
  180. `node_index`::
  181. (Integer)
  182. The index of the current node.
  183. `right_child`::
  184. (Optional, integer)
  185. The index of the right child.
  186. `split_feature`::
  187. (Optional, integer)
  188. The index of the feature value in the feature array.
  189. `split_gain`::
  190. (Optional, double) The information gain from the split.
  191. `threshold`::
  192. (Optional, double)
  193. The decision threshold with which to compare the feature value.
  194. ======
  195. //End tree node
  196. //Begin ensemble
  197. `ensemble`::
  198. (Optional, object)
  199. The definition for an ensemble model. See <<ml-put-trained-models-model-example>>.
  200. +
  201. .Properties of `ensemble`
  202. [%collapsible%open]
  203. ======
  204. //Begin aggregate output
  205. `aggregate_output`::
  206. (Required, object)
  207. An aggregated output object that defines how to aggregate the outputs of the
  208. `trained_models`. Supported objects are `weighted_mode`, `weighted_sum`, and
  209. `logistic_regression`. See <<ml-put-trained-models-aggregated-output-example>>.
  210. +
  211. .Properties of `aggregate_output`
  212. [%collapsible%open]
  213. =======
  214. //Begin logistic regression
  215. `logistic_regression`::
  216. (Optional, object)
  217. This `aggregated_output` type works with binary classification (classification
  218. for values [0, 1]). It multiplies the outputs (in the case of the `ensemble`
  219. model, the inference model values) by the supplied `weights`. The resulting
  220. vector is summed and passed to a
  221. {wikipedia}/Sigmoid_function[`sigmoid` function]. The result
  222. of the `sigmoid` function is considered the probability of class 1 (`P_1`),
  223. consequently, the probability of class 0 is `1 - P_1`. The class with the
  224. highest probability (either 0 or 1) is then returned. For more information about
  225. logistic regression, see
  226. {wikipedia}/Logistic_regression[this wiki article].
  227. +
  228. .Properties of `logistic_regression`
  229. [%collapsible%open]
  230. ========
  231. `weights`:::
  232. (Required, double)
  233. The weights to multiply by the input values (the inference values of the trained
  234. models).
  235. ========
  236. //End logistic regression
  237. //Begin weighted sum
  238. `weighted_sum`::
  239. (Optional, object)
  240. This `aggregated_output` type works with regression. The weighted sum of the
  241. input values.
  242. +
  243. .Properties of `weighted_sum`
  244. [%collapsible%open]
  245. ========
  246. `weights`:::
  247. (Required, double)
  248. The weights to multiply by the input values (the inference values of the trained
  249. models).
  250. ========
  251. //End weighted sum
  252. //Begin weighted mode
  253. `weighted_mode`::
  254. (Optional, object)
  255. This `aggregated_output` type works with regression or classification. It takes
  256. a weighted vote of the input values. The most common input value (taking the
  257. weights into account) is returned.
  258. +
  259. .Properties of `weighted_mode`
  260. [%collapsible%open]
  261. ========
  262. `weights`:::
  263. (Required, double)
  264. The weights to multiply by the input values (the inference values of the trained
  265. models).
  266. ========
  267. //End weighted mode
  268. //Begin exponent
  269. `exponent`::
  270. (Optional, object)
  271. This `aggregated_output` type works with regression. It takes a weighted sum of
  272. the input values and passes the result to an exponent function
  273. (`e^x` where `x` is the sum of the weighted values).
  274. +
  275. .Properties of `exponent`
  276. [%collapsible%open]
  277. ========
  278. `weights`:::
  279. (Required, double)
  280. The weights to multiply by the input values (the inference values of the trained
  281. models).
  282. ========
  283. //End exponent
  284. =======
  285. //End aggregate output
  286. `classification_labels`::
  287. (Optional, string)
  288. An array of classification labels.
  289. `feature_names`::
  290. (Optional, string)
  291. Features expected by the ensemble, in their expected order.
  292. `target_type`::
  293. (Required, string)
  294. String indicating the model target type; `regression` or `classification.`
  295. `trained_models`::
  296. (Required, object)
  297. An array of `trained_model` objects. Supported trained models are `tree` and
  298. `ensemble`.
  299. ======
  300. //End ensemble
  301. =====
  302. //End trained model
  303. ====
  304. //End definition
  305. `description`::
  306. (Optional, string)
  307. A human-readable description of the {infer} trained model.
  308. //Begin inference_config
  309. `inference_config`::
  310. (Required, object)
  311. The default configuration for inference. This can be either a `regression`
  312. or `classification` configuration. It must match the underlying
  313. `definition.trained_model`'s `target_type`.
  314. +
  315. .Properties of `inference_config`
  316. [%collapsible%open]
  317. ====
  318. `regression`:::
  319. (Optional, object)
  320. Regression configuration for inference.
  321. +
  322. .Properties of regression inference
  323. [%collapsible%open]
  324. =====
  325. `num_top_feature_importance_values`::::
  326. (Optional, integer)
  327. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
  328. `results_field`::::
  329. (Optional, string)
  330. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
  331. =====
  332. `classification`:::
  333. (Optional, object)
  334. Classification configuration for inference.
  335. +
  336. .Properties of classification inference
  337. [%collapsible%open]
  338. =====
  339. `num_top_classes`::::
  340. (Optional, integer)
  341. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
  342. `num_top_feature_importance_values`::::
  343. (Optional, integer)
  344. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
  345. `prediction_field_type`::::
  346. (Optional, string)
  347. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
  348. `results_field`::::
  349. (Optional, string)
  350. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
  351. `top_classes_results_field`::::
  352. (Optional, string)
  353. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-top-classes-results-field]
  354. =====
  355. ====
  356. //End of inference_config
  357. //Begin input
  358. `input`::
  359. (Required, object)
  360. The input field names for the model definition.
  361. +
  362. .Properties of `input`
  363. [%collapsible%open]
  364. ====
  365. `field_names`:::
  366. (Required, string)
  367. An array of input field names for the model.
  368. ====
  369. //End input
  370. `metadata`::
  371. (Optional, object)
  372. An object map that contains metadata about the model.
  373. `tags`::
  374. (Optional, string)
  375. An array of tags to organize the model.
  376. [[ml-put-trained-models-example]]
  377. == {api-examples-title}
  378. [[ml-put-trained-models-preprocessor-example]]
  379. === Preprocessor examples
  380. The example below shows a `frequency_encoding` preprocessor object:
  381. [source,js]
  382. ----------------------------------
  383. {
  384. "frequency_encoding":{
  385. "field":"FlightDelayType",
  386. "feature_name":"FlightDelayType_frequency",
  387. "frequency_map":{
  388. "Carrier Delay":0.6007414737092798,
  389. "NAS Delay":0.6007414737092798,
  390. "Weather Delay":0.024573576178086153,
  391. "Security Delay":0.02476631010889467,
  392. "No Delay":0.6007414737092798,
  393. "Late Aircraft Delay":0.6007414737092798
  394. }
  395. }
  396. }
  397. ----------------------------------
  398. //NOTCONSOLE
  399. The next example shows a `one_hot_encoding` preprocessor object:
  400. [source,js]
  401. ----------------------------------
  402. {
  403. "one_hot_encoding":{
  404. "field":"FlightDelayType",
  405. "hot_map":{
  406. "Carrier Delay":"FlightDelayType_Carrier Delay",
  407. "NAS Delay":"FlightDelayType_NAS Delay",
  408. "No Delay":"FlightDelayType_No Delay",
  409. "Late Aircraft Delay":"FlightDelayType_Late Aircraft Delay"
  410. }
  411. }
  412. }
  413. ----------------------------------
  414. //NOTCONSOLE
  415. This example shows a `target_mean_encoding` preprocessor object:
  416. [source,js]
  417. ----------------------------------
  418. {
  419. "target_mean_encoding":{
  420. "field":"FlightDelayType",
  421. "feature_name":"FlightDelayType_targetmean",
  422. "target_map":{
  423. "Carrier Delay":39.97465788139886,
  424. "NAS Delay":39.97465788139886,
  425. "Security Delay":203.171206225681,
  426. "Weather Delay":187.64705882352948,
  427. "No Delay":39.97465788139886,
  428. "Late Aircraft Delay":39.97465788139886
  429. },
  430. "default_value":158.17995752420433
  431. }
  432. }
  433. ----------------------------------
  434. //NOTCONSOLE
  435. [[ml-put-trained-models-model-example]]
  436. === Model examples
  437. The first example shows a `trained_model` object:
  438. [source,js]
  439. ----------------------------------
  440. {
  441. "tree":{
  442. "feature_names":[
  443. "DistanceKilometers",
  444. "FlightTimeMin",
  445. "FlightDelayType_NAS Delay",
  446. "Origin_targetmean",
  447. "DestRegion_targetmean",
  448. "DestCityName_targetmean",
  449. "OriginAirportID_targetmean",
  450. "OriginCityName_frequency",
  451. "DistanceMiles",
  452. "FlightDelayType_Late Aircraft Delay"
  453. ],
  454. "tree_structure":[
  455. {
  456. "decision_type":"lt",
  457. "threshold":9069.33437193022,
  458. "split_feature":0,
  459. "split_gain":4112.094574306927,
  460. "node_index":0,
  461. "default_left":true,
  462. "left_child":1,
  463. "right_child":2
  464. },
  465. ...
  466. {
  467. "node_index":9,
  468. "leaf_value":-27.68987349695448
  469. },
  470. ...
  471. ],
  472. "target_type":"regression"
  473. }
  474. }
  475. ----------------------------------
  476. //NOTCONSOLE
  477. The following example shows an `ensemble` model object:
  478. [source,js]
  479. ----------------------------------
  480. "ensemble":{
  481. "feature_names":[
  482. ...
  483. ],
  484. "trained_models":[
  485. {
  486. "tree":{
  487. "feature_names":[],
  488. "tree_structure":[
  489. {
  490. "decision_type":"lte",
  491. "node_index":0,
  492. "leaf_value":47.64069875778043,
  493. "default_left":false
  494. }
  495. ],
  496. "target_type":"regression"
  497. }
  498. },
  499. ...
  500. ],
  501. "aggregate_output":{
  502. "weighted_sum":{
  503. "weights":[
  504. ...
  505. ]
  506. }
  507. },
  508. "target_type":"regression"
  509. }
  510. ----------------------------------
  511. //NOTCONSOLE
  512. [[ml-put-trained-models-aggregated-output-example]]
  513. === Aggregated output example
  514. Example of a `logistic_regression` object:
  515. [source,js]
  516. ----------------------------------
  517. "aggregate_output" : {
  518. "logistic_regression" : {
  519. "weights" : [2.0, 1.0, .5, -1.0, 5.0, 1.0, 1.0]
  520. }
  521. }
  522. ----------------------------------
  523. //NOTCONSOLE
  524. Example of a `weighted_sum` object:
  525. [source,js]
  526. ----------------------------------
  527. "aggregate_output" : {
  528. "weighted_sum" : {
  529. "weights" : [1.0, -1.0, .5, 1.0, 5.0]
  530. }
  531. }
  532. ----------------------------------
  533. //NOTCONSOLE
  534. Example of a `weighted_mode` object:
  535. [source,js]
  536. ----------------------------------
  537. "aggregate_output" : {
  538. "weighted_mode" : {
  539. "weights" : [1.0, 1.0, 1.0, 1.0, 1.0]
  540. }
  541. }
  542. ----------------------------------
  543. //NOTCONSOLE
  544. Example of an `exponent` object:
  545. [source,js]
  546. ----------------------------------
  547. "aggregate_output" : {
  548. "exponent" : {
  549. "weights" : [1.0, 1.0, 1.0, 1.0, 1.0]
  550. }
  551. }
  552. ----------------------------------
  553. //NOTCONSOLE
  554. [[ml-put-trained-models-json-schema]]
  555. === Trained models JSON schema
  556. For the full JSON schema of trained models,
  557. https://github.com/elastic/ml-json-schemas[click here].