put-transform.asciidoc 6.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[put-data-frame-transform]]
  4. === Create {dataframe-transforms} API
  5. [subs="attributes"]
  6. ++++
  7. <titleabbrev>Create {dataframe-transforms}</titleabbrev>
  8. ++++
  9. Instantiates a {dataframe-transform}.
  10. beta[]
  11. [[put-data-frame-transform-request]]
  12. ==== {api-request-title}
  13. `PUT _data_frame/transforms/<data_frame_transform_id>`
  14. [[put-data-frame-transform-prereqs]]
  15. ==== {api-prereq-title}
  16. * If the {es} {security-features} are enabled, you must have
  17. `manage_data_frame_transforms` cluster privileges to use this API. The built-in
  18. `data_frame_transforms_admin` role has these privileges. You must also
  19. have `read` and `view_index_metadata` privileges on the source index and `read`,
  20. `create_index`, and `index` privileges on the destination index. For more
  21. information, see {stack-ov}/security-privileges.html[Security privileges] and
  22. {stack-ov}/built-in-roles.html[Built-in roles].
  23. [[put-data-frame-transform-desc]]
  24. ==== {api-description-title}
  25. This API defines a {dataframe-transform}, which copies data from source indices,
  26. transforms it, and persists it into an entity-centric destination index. The
  27. entities are defined by the set of `group_by` fields in the `pivot` object. You
  28. can also think of the destination index as a two-dimensional tabular data
  29. structure (known as a {dataframe}). The ID for each document in the
  30. {dataframe} is generated from a hash of the entity, so there is a unique row
  31. per entity. For more information, see
  32. {stack-ov}/ml-dataframes.html[{dataframe-transforms-cap}].
  33. When the {dataframe-transform} is created, a series of validations occur to
  34. ensure its success. For example, there is a check for the existence of the
  35. source indices and a check that the destination index is not part of the source
  36. index pattern. You can use the `defer_validation` parameter to skip these
  37. checks.
  38. Deferred validations are always run when the {dataframe-transform} is started,
  39. with the exception of privilege checks. When {es} {security-features} are
  40. enabled, the {dataframe-transform} remembers which roles the user that created
  41. it had at the time of creation and uses those same roles. If those roles do not
  42. have the required privileges on the source and destination indices, the
  43. {dataframe-transform} fails when it attempts unauthorized operations.
  44. IMPORTANT: You must use {kib} or this API to create a {dataframe-transform}.
  45. Do not put a {dataframe-transform} directly into any
  46. `.data-frame-internal*` indices using the Elasticsearch index API.
  47. If {es} {security-features} are enabled, do not give users any
  48. privileges on `.data-frame-internal*` indices.
  49. [[put-data-frame-transform-path-parms]]
  50. ==== {api-path-parms-title}
  51. `<data_frame_transform_id>`::
  52. (Required, string) Identifier for the {dataframe-transform}. This identifier
  53. can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
  54. underscores. It must start and end with alphanumeric characters.
  55. [[put-data-frame-transform-query-parms]]
  56. ==== {api-query-parms-title}
  57. `defer_validation`::
  58. (Optional, boolean) When `true`, deferrable validations are not run. This
  59. behavior may be desired if the source index does not exist until after the
  60. {dataframe-transform} is created.
  61. [[put-data-frame-transform-request-body]]
  62. ==== {api-request-body-title}
  63. `description`::
  64. (Optional, string) Free text description of the {dataframe-transform}.
  65. `dest`::
  66. (Required, object) Required. The destination configuration, which has the
  67. following properties:
  68. `index`:::
  69. (Required, string) The _destination index_ for the {dataframe-transform}.
  70. `pipeline`:::
  71. (Optional, string) The unique identifier for a <<pipeline,pipeline>>.
  72. `frequency`::
  73. (Optional, time units) The interval between checks for changes in the source
  74. indices when the {dataframe-transform} is running continuously. Also determines
  75. the retry interval in the event of transient failures while the {dataframe-transform} is
  76. searching or indexing. The minimum value is `1s` and the maximum is `1h`. The
  77. default value is `1m`.
  78. `pivot`::
  79. (Required, object) Defines the pivot function `group by` fields and the aggregation to
  80. reduce the data. See <<data-frame-transform-pivot>>.
  81. `source`::
  82. (Required, object) The source configuration, which has the following
  83. properties:
  84. `index`:::
  85. (Required, string or array) The _source indices_ for the
  86. {dataframe-transform}. It can be a single index, an index pattern (for
  87. example, `"myindex*"`), or an array of indices (for example,
  88. `["index1", "index2"]`).
  89. `query`:::
  90. (Optional, object) A query clause that retrieves a subset of data from the
  91. source index. See <<query-dsl>>.
  92. `sync`::
  93. (Optional, object) Defines the properties required to run continuously.
  94. `time`:::
  95. (Required, object) Specifies that the {dataframe-transform} uses a time
  96. field to synchronize the source and destination indices.
  97. `field`::::
  98. (Required, string) The date field that is used to identify new documents
  99. in the source.
  100. +
  101. --
  102. TIP: In general, it’s a good idea to use a field that contains the
  103. <<accessing-ingest-metadata,ingest timestamp>>. If you use a different field,
  104. you might need to set the `delay` such that it accounts for data transmission
  105. delays.
  106. --
  107. `delay`::::
  108. (Optional, time units) The time delay between the current time and the
  109. latest input data time. The default value is `60s`.
  110. [[put-data-frame-transform-example]]
  111. ==== {api-examples-title}
  112. [source,js]
  113. --------------------------------------------------
  114. PUT _data_frame/transforms/ecommerce_transform
  115. {
  116. "source": {
  117. "index": "kibana_sample_data_ecommerce",
  118. "query": {
  119. "term": {
  120. "geoip.continent_name": {
  121. "value": "Asia"
  122. }
  123. }
  124. }
  125. },
  126. "pivot": {
  127. "group_by": {
  128. "customer_id": {
  129. "terms": {
  130. "field": "customer_id"
  131. }
  132. }
  133. },
  134. "aggregations": {
  135. "max_price": {
  136. "max": {
  137. "field": "taxful_total_price"
  138. }
  139. }
  140. }
  141. },
  142. "description": "Maximum priced ecommerce data by customer_id in Asia",
  143. "dest": {
  144. "index": "kibana_sample_data_ecommerce_transform",
  145. "pipeline": "add_timestamp_pipeline"
  146. },
  147. "frequency": "5m",
  148. "sync": {
  149. "time": {
  150. "field": "order_date",
  151. "delay": "60s"
  152. }
  153. }
  154. }
  155. --------------------------------------------------
  156. // CONSOLE
  157. // TEST[setup:kibana_sample_data_ecommerce]
  158. When the transform is created, you receive the following results:
  159. [source,js]
  160. ----
  161. {
  162. "acknowledged" : true
  163. }
  164. ----
  165. // TESTRESPONSE