1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071 |
- [role="xpack"]
- [[transform-overview]]
- === {transform-cap} overview
- ++++
- <titleabbrev>Overview</titleabbrev>
- ++++
- beta[]
- A _{dataframe}_ is a two-dimensional tabular data structure. In the context of
- the {stack}, it is a transformation of data that is indexed in {es}. For
- example, you can use {dataframes} to _pivot_ your data into a new entity-centric
- index. By transforming and summarizing your data, it becomes possible to
- visualize and analyze it in alternative and interesting ways.
- A lot of {es} indices are organized as a stream of events: each event is an
- individual document, for example a single item purchase. {dataframes-cap} enable
- you to summarize this data, bringing it into an organized, more
- analysis-friendly format. For example, you can summarize all the purchases of a
- single customer.
- You can create {dataframes} by using {transforms}.
- {transforms-cap} enable you to define a pivot, which is a set of
- features that transform the index into a different, more digestible format.
- Pivoting results in a summary of your data, which is the {dataframe}.
- To define a pivot, first you select one or more fields that you will use to
- group your data. You can select categorical fields (terms) and numerical fields
- for grouping. If you use numerical fields, the field values are bucketed using
- an interval that you specify.
- The second step is deciding how you want to aggregate the grouped data. When
- using aggregations, you practically ask questions about the index. There are
- different types of aggregations, each with its own purpose and output. To learn
- more about the supported aggregations and group-by fields, see
- {ref}/transform-resource.html[{transform-cap} resources].
- As an optional step, you can also add a query to further limit the scope of the
- aggregation.
- The {transform} performs a composite aggregation that
- paginates through all the data defined by the source index query. The output of
- the aggregation is stored in a destination index. Each time the
- {transform} queries the source index, it creates a _checkpoint_. You
- can decide whether you want the {transform} to run once (batch
- {transform}) or continuously ({transform}). A batch
- {transform} is a single operation that has a single checkpoint.
- {ctransforms-cap} continually increment and process checkpoints as new
- source data is ingested.
- .Example
- Imagine that you run a webshop that sells clothes. Every order creates a document
- that contains a unique order ID, the name and the category of the ordered product,
- its price, the ordered quantity, the exact date of the order, and some customer
- information (name, gender, location, etc). Your dataset contains all the transactions
- from last year.
- If you want to check the sales in the different categories in your last fiscal
- year, define a {transform} that groups the data by the product
- categories (women's shoes, men's clothing, etc.) and the order date. Use the
- last year as the interval for the order date. Then add a sum aggregation on the
- ordered quantity. The result is a {dataframe} that shows the number of sold
- items in every product category in the last year.
- [role="screenshot"]
- image::images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"]
- IMPORTANT: The {transform} leaves your source index intact. It
- creates a new index that is dedicated to the {dataframe}.
|