| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118 | [role="xpack"][[transform-overview]]= {transform-cap} overview++++<titleabbrev>Overview</titleabbrev>++++You can choose either of the following methods to transform your data:<<pivot-transform-overview,pivot>> or <<latest-transform-overview,latest>>.IMPORTANT: All {transforms} leave your source index intact. They create a newindex that is dedicated to the transformed data.{transforms-cap} are persistent tasks; they are stored in cluster state which makes them resilient for node failures. Refer to <<transform-checkpoints>> and <<ml-transform-checkpoint-errors>> to learn more about the machinery behind {transforms}.[[pivot-transform-overview]]== Pivot {transforms}You can use {transforms} to _pivot_ your data into a new entity-centric index.By transforming and summarizing your data, it becomes possible to visualize andanalyze it in alternative and interesting ways.A lot of {es} indices are organized as a stream of events: each event is anindividual document, for example a single item purchase. {transforms-cap} enableyou to summarize this data, bringing it into an organized, moreanalysis-friendly format. For example, you can summarize all the purchases of asingle customer.{transforms-cap} enable you to define a pivot, which is a set offeatures that transform the index into a different, more digestible format.Pivoting results in a summary of your data in a new index.To define a pivot, first you select one or more fields that you will use togroup your data. You can select categorical fields (terms) and numerical fieldsfor grouping. If you use numerical fields, the field values are bucketed usingan interval that you specify.The second step is deciding how you want to aggregate the grouped data. Whenusing aggregations, you practically ask questions about the index. There aredifferent types of aggregations, each with its own purpose and output. To learnmore about the supported aggregations and group-by fields, see<<put-transform>>.As an optional step, you can also add a query to further limit the scope of theaggregation.The {transform} performs a composite aggregation that paginates through all thedata defined by the source index query. The output of the aggregation is storedin a _destination index_. Each time the {transform} queries the source index, itcreates a _checkpoint_. You can decide whether you want the {transform} to runonce or continuously. A _batch {transform}_ is a single operation that has asingle checkpoint. _{ctransforms-cap}_ continually increment and processcheckpoints as new source data is ingested.Imagine that you run a webshop that sells clothes. Every order creates adocument that contains a unique order ID, the name and the category of theordered product, its price, the ordered quantity, the exact date of the order,and some customer information (name, gender, location, etc). Your data setcontains all the transactions from last year.If you want to check the sales in the different categories in your last fiscalyear, define a {transform} that groups the data by the product categories(women's shoes, men's clothing, etc.) and the order date. Use the last year asthe interval for the order date. Then add a sum aggregation on the orderedquantity. The result is an entity-centric index that shows the number of solditems in every product category in the last year.[role="screenshot"]image::images/pivot-preview.png["Example of a pivot {transform} preview in {kib}"][[latest-transform-overview]]== Latest {transforms}You can use the `latest` type of {transform} to copy the most recent documentsinto a new index. You must identify one or more fields as the unique key forgrouping your data, as well as a date field that sorts the data chronologically.For example, you can use this type of {transform} to keep track of the latestpurchase for each customer or the latest event for each host.[role="screenshot"]image::images/latest-preview.png["Example of a latest {transform} preview in {kib}"]As in the case of a pivot, a latest {transform} can run once or continuously. Itperforms a composite aggregation on the data in the source index and stores theoutput in the destination index. If the {transform} runs continuously, new uniquekey values are automatically added to the destination index and the most recentdocuments for existing key values are automatically updated at each checkpoint.[[transform-performance]]== Performance considerations{transforms-cap} perform search aggregations on the source indices then indexthe results into the destination index. Therefore, a {transform} never takesless time or uses less resources than the aggregation and indexing processes.If your {transform} must process a lot of historic data, it has high resourceusage initially--particularly during the first checkpoint.For better performance, make sure that your search aggregations and queries areoptimized and that your {transform} is processing only necessary data. Considerwhether you can apply a source query to the {transform} to reduce the scope ofdata it processes. Also consider whether the cluster has sufficient resources inplace to support both the composite aggregation search and the indexing of itsresults.If you prefer to spread out the impact on your cluster (at the cost of a slower{transform}), you can throttle the rate at which it performs search and indexrequests. Set the `docs_per_second` limit when you <<put-transform,create>> or<<update-transform,update>> your {transform}. If you want to calculate thecurrent rate, use the following information from the{ref}/get-transform-stats.html[get {transform} stats API]:```documents_processed / search_time_in_ms * 1000```
 |