123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196 |
- [role="xpack"]
- [testenv="basic"]
- [[put-data-frame-transform]]
- === Create {dataframe-transforms} API
- [subs="attributes"]
- ++++
- <titleabbrev>Create {dataframe-transforms}</titleabbrev>
- ++++
- Instantiates a {dataframe-transform}.
- beta[]
- [[put-data-frame-transform-request]]
- ==== {api-request-title}
- `PUT _data_frame/transforms/<data_frame_transform_id>`
- [[put-data-frame-transform-prereqs]]
- ==== {api-prereq-title}
- * If the {es} {security-features} are enabled, you must have
- `manage_data_frame_transforms` cluster privileges to use this API. The built-in
- `data_frame_transforms_admin` role has these privileges. You must also
- have `read` and `view_index_metadata` privileges on the source index and `read`,
- `create_index`, and `index` privileges on the destination index. For more
- information, see {stack-ov}/security-privileges.html[Security privileges] and
- {stack-ov}/built-in-roles.html[Built-in roles].
- [[put-data-frame-transform-desc]]
- ==== {api-description-title}
- This API defines a {dataframe-transform}, which copies data from source indices,
- transforms it, and persists it into an entity-centric destination index. The
- entities are defined by the set of `group_by` fields in the `pivot` object. You
- can also think of the destination index as a two-dimensional tabular data
- structure (known as a {dataframe}). The ID for each document in the
- {dataframe} is generated from a hash of the entity, so there is a unique row
- per entity. For more information, see
- {stack-ov}/ml-dataframes.html[{dataframe-transforms-cap}].
- When the {dataframe-transform} is created, a series of validations occur to
- ensure its success. For example, there is a check for the existence of the
- source indices and a check that the destination index is not part of the source
- index pattern. You can use the `defer_validation` parameter to skip these
- checks.
- Deferred validations are always run when the {dataframe-transform} is started,
- with the exception of privilege checks. When {es} {security-features} are
- enabled, the {dataframe-transform} remembers which roles the user that created
- it had at the time of creation and uses those same roles. If those roles do not
- have the required privileges on the source and destination indices, the
- {dataframe-transform} fails when it attempts unauthorized operations.
- IMPORTANT: You must use {kib} or this API to create a {dataframe-transform}.
- Do not put a {dataframe-transform} directly into any
- `.data-frame-internal*` indices using the Elasticsearch index API.
- If {es} {security-features} are enabled, do not give users any
- privileges on `.data-frame-internal*` indices.
- [[put-data-frame-transform-path-parms]]
- ==== {api-path-parms-title}
- `<data_frame_transform_id>`::
- (Required, string) Identifier for the {dataframe-transform}. This identifier
- can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and
- underscores. It must start and end with alphanumeric characters.
- [[put-data-frame-transform-query-parms]]
- ==== {api-query-parms-title}
- `defer_validation`::
- (Optional, boolean) When `true`, deferrable validations are not run. This
- behavior may be desired if the source index does not exist until after the
- {dataframe-transform} is created.
- [[put-data-frame-transform-request-body]]
- ==== {api-request-body-title}
- `description`::
- (Optional, string) Free text description of the {dataframe-transform}.
- `dest`::
- (Required, object) Required. The destination configuration, which has the
- following properties:
-
- `index`:::
- (Required, string) The _destination index_ for the {dataframe-transform}.
- `pipeline`:::
- (Optional, string) The unique identifier for a <<pipeline,pipeline>>.
- `frequency`::
- (Optional, time units) The interval between checks for changes in the source
- indices when the {dataframe-transform} is running continuously. Also determines
- the retry interval in the event of transient failures while the {dataframe-transform} is
- searching or indexing. The minimum value is `1s` and the maximum is `1h`. The
- default value is `1m`.
- `pivot`::
- (Required, object) Defines the pivot function `group by` fields and the aggregation to
- reduce the data. See <<data-frame-transform-pivot>>.
- `source`::
- (Required, object) The source configuration, which has the following
- properties:
-
- `index`:::
- (Required, string or array) The _source indices_ for the
- {dataframe-transform}. It can be a single index, an index pattern (for
- example, `"myindex*"`), or an array of indices (for example,
- `["index1", "index2"]`).
-
- `query`:::
- (Optional, object) A query clause that retrieves a subset of data from the
- source index. See <<query-dsl>>.
-
- `sync`::
- (Optional, object) Defines the properties required to run continuously.
- `time`:::
- (Required, object) Specifies that the {dataframe-transform} uses a time
- field to synchronize the source and destination indices.
- `field`::::
- (Required, string) The date field that is used to identify new documents
- in the source.
- +
- --
- TIP: In general, it’s a good idea to use a field that contains the
- <<accessing-ingest-metadata,ingest timestamp>>. If you use a different field,
- you might need to set the `delay` such that it accounts for data transmission
- delays.
- --
- `delay`::::
- (Optional, time units) The time delay between the current time and the
- latest input data time. The default value is `60s`.
- [[put-data-frame-transform-example]]
- ==== {api-examples-title}
- [source,js]
- --------------------------------------------------
- PUT _data_frame/transforms/ecommerce_transform
- {
- "source": {
- "index": "kibana_sample_data_ecommerce",
- "query": {
- "term": {
- "geoip.continent_name": {
- "value": "Asia"
- }
- }
- }
- },
- "pivot": {
- "group_by": {
- "customer_id": {
- "terms": {
- "field": "customer_id"
- }
- }
- },
- "aggregations": {
- "max_price": {
- "max": {
- "field": "taxful_total_price"
- }
- }
- }
- },
- "description": "Maximum priced ecommerce data by customer_id in Asia",
- "dest": {
- "index": "kibana_sample_data_ecommerce_transform",
- "pipeline": "add_timestamp_pipeline"
- },
- "frequency": "5m",
- "sync": {
- "time": {
- "field": "order_date",
- "delay": "60s"
- }
- }
- }
- --------------------------------------------------
- // CONSOLE
- // TEST[setup:kibana_sample_data_ecommerce]
- When the transform is created, you receive the following results:
- [source,js]
- ----
- {
- "acknowledged" : true
- }
- ----
- // TESTRESPONSE
|