put-trained-models.asciidoc 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[put-trained-models]]
  4. = Create trained models API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Create trained models</titleabbrev>
  8. ++++
  9. Creates a trained model.
  10. WARNING: Models created in version 7.8.0 are not backwards compatible
  11. with older node versions. If in a mixed cluster environment,
  12. all nodes must be at least 7.8.0 to use a model stored by
  13. a 7.8.0 node.
  14. [[ml-put-trained-models-request]]
  15. == {api-request-title}
  16. `PUT _ml/trained_models/<model_id>`
  17. [[ml-put-trained-models-prereq]]
  18. == {api-prereq-title}
  19. If the {es} {security-features} are enabled, you must have the following
  20. built-in roles or equivalent privileges:
  21. * `machine_learning_admin`
  22. For more information, see <<built-in-roles>> and {ml-docs-setup-privileges}.
  23. [[ml-put-trained-models-desc]]
  24. == {api-description-title}
  25. The create trained model API enables you to supply a trained model that is not
  26. created by {dfanalytics}.
  27. [[ml-put-trained-models-path-params]]
  28. == {api-path-parms-title}
  29. `<model_id>`::
  30. (Required, string)
  31. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]
  32. [role="child_attributes"]
  33. [[ml-put-trained-models-request-body]]
  34. == {api-request-body-title}
  35. `compressed_definition`::
  36. (Required, string)
  37. The compressed (GZipped and Base64 encoded) {infer} definition of the model.
  38. If `compressed_definition` is specified, then `definition` cannot be specified.
  39. //Begin definition
  40. `definition`::
  41. (Required, object)
  42. The {infer} definition for the model. If `definition` is specified, then
  43. `compressed_definition` cannot be specified.
  44. +
  45. .Properties of `definition`
  46. [%collapsible%open]
  47. ====
  48. //Begin preprocessors
  49. `preprocessors`::
  50. (Optional, object)
  51. Collection of preprocessors. See <<ml-put-trained-models-preprocessor-example>>.
  52. +
  53. .Properties of `preprocessors`
  54. [%collapsible%open]
  55. =====
  56. //Begin frequency encoding
  57. `frequency_encoding`::
  58. (Required, object)
  59. Defines a frequency encoding for a field.
  60. +
  61. .Properties of `frequency_encoding`
  62. [%collapsible%open]
  63. ======
  64. `feature_name`::
  65. (Required, string)
  66. The name of the resulting feature.
  67. `field`::
  68. (Required, string)
  69. The field name to encode.
  70. `frequency_map`::
  71. (Required, object map of string:double)
  72. Object that maps the field value to the frequency encoded value.
  73. `custom`::
  74. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=custom-preprocessor]
  75. ======
  76. //End frequency encoding
  77. //Begin one hot encoding
  78. `one_hot_encoding`::
  79. (Required, object)
  80. Defines a one hot encoding map for a field.
  81. +
  82. .Properties of `one_hot_encoding`
  83. [%collapsible%open]
  84. ======
  85. `field`::
  86. (Required, string)
  87. The field name to encode.
  88. `hot_map`::
  89. (Required, object map of strings)
  90. String map of "field_value: one_hot_column_name".
  91. `custom`::
  92. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=custom-preprocessor]
  93. ======
  94. //End one hot encoding
  95. //Begin target mean encoding
  96. `target_mean_encoding`::
  97. (Required, object)
  98. Defines a target mean encoding for a field.
  99. +
  100. .Properties of `target_mean_encoding`
  101. [%collapsible%open]
  102. ======
  103. `default_value`:::
  104. (Required, double)
  105. The feature value if the field value is not in the `target_map`.
  106. `feature_name`:::
  107. (Required, string)
  108. The name of the resulting feature.
  109. `field`:::
  110. (Required, string)
  111. The field name to encode.
  112. `target_map`:::
  113. (Required, object map of string:double)
  114. Object that maps the field value to the target mean value.
  115. `custom`::
  116. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=custom-preprocessor]
  117. ======
  118. //End target mean encoding
  119. =====
  120. //End preprocessors
  121. //Begin trained model
  122. `trained_model`::
  123. (Required, object)
  124. The definition of the trained model.
  125. +
  126. .Properties of `trained_model`
  127. [%collapsible%open]
  128. =====
  129. //Begin tree
  130. `tree`::
  131. (Required, object)
  132. The definition for a binary decision tree.
  133. +
  134. .Properties of `tree`
  135. [%collapsible%open]
  136. ======
  137. `classification_labels`:::
  138. (Optional, string) An array of classification labels (used for
  139. `classification`).
  140. `feature_names`:::
  141. (Required, string)
  142. Features expected by the tree, in their expected order.
  143. `target_type`:::
  144. (Required, string)
  145. String indicating the model target type; `regression` or `classification`.
  146. `tree_structure`:::
  147. (Required, object)
  148. An array of `tree_node` objects. The nodes must be in ordinal order by their
  149. `tree_node.node_index` value.
  150. ======
  151. //End tree
  152. //Begin tree node
  153. `tree_node`::
  154. (Required, object)
  155. The definition of a node in a tree.
  156. +
  157. --
  158. There are two major types of nodes: leaf nodes and not-leaf nodes.
  159. * Leaf nodes only need `node_index` and `leaf_value` defined.
  160. * All other nodes need `split_feature`, `left_child`, `right_child`,
  161. `threshold`, `decision_type`, and `default_left` defined.
  162. --
  163. +
  164. .Properties of `tree_node`
  165. [%collapsible%open]
  166. ======
  167. `decision_type`::
  168. (Optional, string)
  169. Indicates the positive value (in other words, when to choose the left node)
  170. decision type. Supported `lt`, `lte`, `gt`, `gte`. Defaults to `lte`.
  171. `default_left`::
  172. (Optional, Boolean)
  173. Indicates whether to default to the left when the feature is missing. Defaults
  174. to `true`.
  175. `leaf_value`::
  176. (Optional, double)
  177. The leaf value of the of the node, if the value is a leaf (in other words, no
  178. children).
  179. `left_child`::
  180. (Optional, integer)
  181. The index of the left child.
  182. `node_index`::
  183. (Integer)
  184. The index of the current node.
  185. `right_child`::
  186. (Optional, integer)
  187. The index of the right child.
  188. `split_feature`::
  189. (Optional, integer)
  190. The index of the feature value in the feature array.
  191. `split_gain`::
  192. (Optional, double) The information gain from the split.
  193. `threshold`::
  194. (Optional, double)
  195. The decision threshold with which to compare the feature value.
  196. ======
  197. //End tree node
  198. //Begin ensemble
  199. `ensemble`::
  200. (Optional, object)
  201. The definition for an ensemble model. See <<ml-put-trained-models-model-example>>.
  202. +
  203. .Properties of `ensemble`
  204. [%collapsible%open]
  205. ======
  206. //Begin aggregate output
  207. `aggregate_output`::
  208. (Required, object)
  209. An aggregated output object that defines how to aggregate the outputs of the
  210. `trained_models`. Supported objects are `weighted_mode`, `weighted_sum`, and
  211. `logistic_regression`. See <<ml-put-trained-models-aggregated-output-example>>.
  212. +
  213. .Properties of `aggregate_output`
  214. [%collapsible%open]
  215. =======
  216. //Begin logistic regression
  217. `logistic_regression`::
  218. (Optional, object)
  219. This `aggregated_output` type works with binary classification (classification
  220. for values [0, 1]). It multiplies the outputs (in the case of the `ensemble`
  221. model, the inference model values) by the supplied `weights`. The resulting
  222. vector is summed and passed to a
  223. {wikipedia}/Sigmoid_function[`sigmoid` function]. The result
  224. of the `sigmoid` function is considered the probability of class 1 (`P_1`),
  225. consequently, the probability of class 0 is `1 - P_1`. The class with the
  226. highest probability (either 0 or 1) is then returned. For more information about
  227. logistic regression, see
  228. {wikipedia}/Logistic_regression[this wiki article].
  229. +
  230. .Properties of `logistic_regression`
  231. [%collapsible%open]
  232. ========
  233. `weights`:::
  234. (Required, double)
  235. The weights to multiply by the input values (the inference values of the trained
  236. models).
  237. ========
  238. //End logistic regression
  239. //Begin weighted sum
  240. `weighted_sum`::
  241. (Optional, object)
  242. This `aggregated_output` type works with regression. The weighted sum of the
  243. input values.
  244. +
  245. .Properties of `weighted_sum`
  246. [%collapsible%open]
  247. ========
  248. `weights`:::
  249. (Required, double)
  250. The weights to multiply by the input values (the inference values of the trained
  251. models).
  252. ========
  253. //End weighted sum
  254. //Begin weighted mode
  255. `weighted_mode`::
  256. (Optional, object)
  257. This `aggregated_output` type works with regression or classification. It takes
  258. a weighted vote of the input values. The most common input value (taking the
  259. weights into account) is returned.
  260. +
  261. .Properties of `weighted_mode`
  262. [%collapsible%open]
  263. ========
  264. `weights`:::
  265. (Required, double)
  266. The weights to multiply by the input values (the inference values of the trained
  267. models).
  268. ========
  269. //End weighted mode
  270. //Begin exponent
  271. `exponent`::
  272. (Optional, object)
  273. This `aggregated_output` type works with regression. It takes a weighted sum of
  274. the input values and passes the result to an exponent function
  275. (`e^x` where `x` is the sum of the weighted values).
  276. +
  277. .Properties of `exponent`
  278. [%collapsible%open]
  279. ========
  280. `weights`:::
  281. (Required, double)
  282. The weights to multiply by the input values (the inference values of the trained
  283. models).
  284. ========
  285. //End exponent
  286. =======
  287. //End aggregate output
  288. `classification_labels`::
  289. (Optional, string)
  290. An array of classification labels.
  291. `feature_names`::
  292. (Optional, string)
  293. Features expected by the ensemble, in their expected order.
  294. `target_type`::
  295. (Required, string)
  296. String indicating the model target type; `regression` or `classification.`
  297. `trained_models`::
  298. (Required, object)
  299. An array of `trained_model` objects. Supported trained models are `tree` and
  300. `ensemble`.
  301. ======
  302. //End ensemble
  303. =====
  304. //End trained model
  305. ====
  306. //End definition
  307. `description`::
  308. (Optional, string)
  309. A human-readable description of the {infer} trained model.
  310. //Begin inference_config
  311. `inference_config`::
  312. (Required, object)
  313. The default configuration for inference. This can be either a `regression`
  314. or `classification` configuration. It must match the underlying
  315. `definition.trained_model`'s `target_type`.
  316. +
  317. .Properties of `inference_config`
  318. [%collapsible%open]
  319. ====
  320. `regression`:::
  321. (Optional, object)
  322. Regression configuration for inference.
  323. +
  324. .Properties of regression inference
  325. [%collapsible%open]
  326. =====
  327. `num_top_feature_importance_values`::::
  328. (Optional, integer)
  329. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-regression-num-top-feature-importance-values]
  330. `results_field`::::
  331. (Optional, string)
  332. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
  333. =====
  334. `classification`:::
  335. (Optional, object)
  336. Classification configuration for inference.
  337. +
  338. .Properties of classification inference
  339. [%collapsible%open]
  340. =====
  341. `num_top_classes`::::
  342. (Optional, integer)
  343. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-classes]
  344. `num_top_feature_importance_values`::::
  345. (Optional, integer)
  346. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-num-top-feature-importance-values]
  347. `prediction_field_type`::::
  348. (Optional, string)
  349. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-prediction-field-type]
  350. `results_field`::::
  351. (Optional, string)
  352. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-results-field]
  353. `top_classes_results_field`::::
  354. (Optional, string)
  355. include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=inference-config-classification-top-classes-results-field]
  356. =====
  357. ====
  358. //End of inference_config
  359. //Begin input
  360. `input`::
  361. (Required, object)
  362. The input field names for the model definition.
  363. +
  364. .Properties of `input`
  365. [%collapsible%open]
  366. ====
  367. `field_names`:::
  368. (Required, string)
  369. An array of input field names for the model.
  370. ====
  371. //End input
  372. `metadata`::
  373. (Optional, object)
  374. An object map that contains metadata about the model.
  375. `tags`::
  376. (Optional, string)
  377. An array of tags to organize the model.
  378. [[ml-put-trained-models-example]]
  379. == {api-examples-title}
  380. [[ml-put-trained-models-preprocessor-example]]
  381. === Preprocessor examples
  382. The example below shows a `frequency_encoding` preprocessor object:
  383. [source,js]
  384. ----------------------------------
  385. {
  386. "frequency_encoding":{
  387. "field":"FlightDelayType",
  388. "feature_name":"FlightDelayType_frequency",
  389. "frequency_map":{
  390. "Carrier Delay":0.6007414737092798,
  391. "NAS Delay":0.6007414737092798,
  392. "Weather Delay":0.024573576178086153,
  393. "Security Delay":0.02476631010889467,
  394. "No Delay":0.6007414737092798,
  395. "Late Aircraft Delay":0.6007414737092798
  396. }
  397. }
  398. }
  399. ----------------------------------
  400. //NOTCONSOLE
  401. The next example shows a `one_hot_encoding` preprocessor object:
  402. [source,js]
  403. ----------------------------------
  404. {
  405. "one_hot_encoding":{
  406. "field":"FlightDelayType",
  407. "hot_map":{
  408. "Carrier Delay":"FlightDelayType_Carrier Delay",
  409. "NAS Delay":"FlightDelayType_NAS Delay",
  410. "No Delay":"FlightDelayType_No Delay",
  411. "Late Aircraft Delay":"FlightDelayType_Late Aircraft Delay"
  412. }
  413. }
  414. }
  415. ----------------------------------
  416. //NOTCONSOLE
  417. This example shows a `target_mean_encoding` preprocessor object:
  418. [source,js]
  419. ----------------------------------
  420. {
  421. "target_mean_encoding":{
  422. "field":"FlightDelayType",
  423. "feature_name":"FlightDelayType_targetmean",
  424. "target_map":{
  425. "Carrier Delay":39.97465788139886,
  426. "NAS Delay":39.97465788139886,
  427. "Security Delay":203.171206225681,
  428. "Weather Delay":187.64705882352948,
  429. "No Delay":39.97465788139886,
  430. "Late Aircraft Delay":39.97465788139886
  431. },
  432. "default_value":158.17995752420433
  433. }
  434. }
  435. ----------------------------------
  436. //NOTCONSOLE
  437. [[ml-put-trained-models-model-example]]
  438. === Model examples
  439. The first example shows a `trained_model` object:
  440. [source,js]
  441. ----------------------------------
  442. {
  443. "tree":{
  444. "feature_names":[
  445. "DistanceKilometers",
  446. "FlightTimeMin",
  447. "FlightDelayType_NAS Delay",
  448. "Origin_targetmean",
  449. "DestRegion_targetmean",
  450. "DestCityName_targetmean",
  451. "OriginAirportID_targetmean",
  452. "OriginCityName_frequency",
  453. "DistanceMiles",
  454. "FlightDelayType_Late Aircraft Delay"
  455. ],
  456. "tree_structure":[
  457. {
  458. "decision_type":"lt",
  459. "threshold":9069.33437193022,
  460. "split_feature":0,
  461. "split_gain":4112.094574306927,
  462. "node_index":0,
  463. "default_left":true,
  464. "left_child":1,
  465. "right_child":2
  466. },
  467. ...
  468. {
  469. "node_index":9,
  470. "leaf_value":-27.68987349695448
  471. },
  472. ...
  473. ],
  474. "target_type":"regression"
  475. }
  476. }
  477. ----------------------------------
  478. //NOTCONSOLE
  479. The following example shows an `ensemble` model object:
  480. [source,js]
  481. ----------------------------------
  482. "ensemble":{
  483. "feature_names":[
  484. ...
  485. ],
  486. "trained_models":[
  487. {
  488. "tree":{
  489. "feature_names":[],
  490. "tree_structure":[
  491. {
  492. "decision_type":"lte",
  493. "node_index":0,
  494. "leaf_value":47.64069875778043,
  495. "default_left":false
  496. }
  497. ],
  498. "target_type":"regression"
  499. }
  500. },
  501. ...
  502. ],
  503. "aggregate_output":{
  504. "weighted_sum":{
  505. "weights":[
  506. ...
  507. ]
  508. }
  509. },
  510. "target_type":"regression"
  511. }
  512. ----------------------------------
  513. //NOTCONSOLE
  514. [[ml-put-trained-models-aggregated-output-example]]
  515. === Aggregated output example
  516. Example of a `logistic_regression` object:
  517. [source,js]
  518. ----------------------------------
  519. "aggregate_output" : {
  520. "logistic_regression" : {
  521. "weights" : [2.0, 1.0, .5, -1.0, 5.0, 1.0, 1.0]
  522. }
  523. }
  524. ----------------------------------
  525. //NOTCONSOLE
  526. Example of a `weighted_sum` object:
  527. [source,js]
  528. ----------------------------------
  529. "aggregate_output" : {
  530. "weighted_sum" : {
  531. "weights" : [1.0, -1.0, .5, 1.0, 5.0]
  532. }
  533. }
  534. ----------------------------------
  535. //NOTCONSOLE
  536. Example of a `weighted_mode` object:
  537. [source,js]
  538. ----------------------------------
  539. "aggregate_output" : {
  540. "weighted_mode" : {
  541. "weights" : [1.0, 1.0, 1.0, 1.0, 1.0]
  542. }
  543. }
  544. ----------------------------------
  545. //NOTCONSOLE
  546. Example of an `exponent` object:
  547. [source,js]
  548. ----------------------------------
  549. "aggregate_output" : {
  550. "exponent" : {
  551. "weights" : [1.0, 1.0, 1.0, 1.0, 1.0]
  552. }
  553. }
  554. ----------------------------------
  555. //NOTCONSOLE
  556. [[ml-put-trained-models-json-schema]]
  557. === Trained models JSON schema
  558. For the full JSON schema of trained models,
  559. https://github.com/elastic/ml-json-schemas[click here].