overview.asciidoc 5.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124
  1. [role="xpack"]
  2. [[transform-overview]]
  3. = {transform-cap} overview
  4. ++++
  5. <titleabbrev>Overview</titleabbrev>
  6. ++++
  7. You can choose either of the following methods to transform your data:
  8. <<pivot-transform-overview,pivot>> or <<latest-transform-overview,latest>>.
  9. [IMPORTANT]
  10. ====
  11. * All {transforms} leave your source index intact. They create a new
  12. index that is dedicated to the transformed data.
  13. * {transforms-cap} might have more configuration options provided by the APIs
  14. than the options available in {kib}. For all the {transform} configuration
  15. options, refer to the <<transform-apis,API documentation>>.
  16. ====
  17. {transforms-cap} are persistent tasks; they are stored in cluster state which
  18. makes them resilient for node failures. Refer to <<transform-checkpoints>> and
  19. <<ml-transform-checkpoint-errors>> to learn more about the machinery behind
  20. {transforms}.
  21. [[pivot-transform-overview]]
  22. == Pivot {transforms}
  23. You can use {transforms} to _pivot_ your data into a new entity-centric index.
  24. By transforming and summarizing your data, it becomes possible to visualize and
  25. analyze it in alternative and interesting ways.
  26. A lot of {es} indices are organized as a stream of events: each event is an
  27. individual document, for example a single item purchase. {transforms-cap} enable
  28. you to summarize this data, bringing it into an organized, more
  29. analysis-friendly format. For example, you can summarize all the purchases of a
  30. single customer.
  31. {transforms-cap} enable you to define a pivot, which is a set of
  32. features that transform the index into a different, more digestible format.
  33. Pivoting results in a summary of your data in a new index.
  34. To define a pivot, first you select one or more fields that you will use to
  35. group your data. You can select categorical fields (terms) and numerical fields
  36. for grouping. If you use numerical fields, the field values are bucketed using
  37. an interval that you specify.
  38. The second step is deciding how you want to aggregate the grouped data. When
  39. using aggregations, you practically ask questions about the index. There are
  40. different types of aggregations, each with its own purpose and output. To learn
  41. more about the supported aggregations and group-by fields, see
  42. <<put-transform>>.
  43. As an optional step, you can also add a query to further limit the scope of the
  44. aggregation.
  45. The {transform} performs a composite aggregation that paginates through all the
  46. data defined by the source index query. The output of the aggregation is stored
  47. in a _destination index_. Each time the {transform} queries the source index, it
  48. creates a _checkpoint_. You can decide whether you want the {transform} to run
  49. once or continuously. A _batch {transform}_ is a single operation that has a
  50. single checkpoint. _{ctransforms-cap}_ continually increment and process
  51. checkpoints as new source data is ingested.
  52. Imagine that you run a webshop that sells clothes. Every order creates a
  53. document that contains a unique order ID, the name and the category of the
  54. ordered product, its price, the ordered quantity, the exact date of the order,
  55. and some customer information (name, gender, location, etc). Your data set
  56. contains all the transactions from last year.
  57. If you want to check the sales in the different categories in your last fiscal
  58. year, define a {transform} that groups the data by the product categories
  59. (women's shoes, men's clothing, etc.) and the order date. Use the last year as
  60. the interval for the order date. Then add a sum aggregation on the ordered
  61. quantity. The result is an entity-centric index that shows the number of sold
  62. items in every product category in the last year.
  63. [role="screenshot"]
  64. image::images/pivot-preview.png["Example of a pivot {transform} preview in {kib}"]
  65. [[latest-transform-overview]]
  66. == Latest {transforms}
  67. You can use the `latest` type of {transform} to copy the most recent documents
  68. into a new index. You must identify one or more fields as the unique key for
  69. grouping your data, as well as a date field that sorts the data chronologically.
  70. For example, you can use this type of {transform} to keep track of the latest
  71. purchase for each customer or the latest event for each host.
  72. [role="screenshot"]
  73. image::images/latest-preview.png["Example of a latest {transform} preview in {kib}"]
  74. As in the case of a pivot, a latest {transform} can run once or continuously. It
  75. performs a composite aggregation on the data in the source index and stores the
  76. output in the destination index. If the {transform} runs continuously, new unique
  77. key values are automatically added to the destination index and the most recent
  78. documents for existing key values are automatically updated at each checkpoint.
  79. [[transform-performance]]
  80. == Performance considerations
  81. {transforms-cap} perform search aggregations on the source indices then index
  82. the results into the destination index. Therefore, a {transform} never takes
  83. less time or uses less resources than the aggregation and indexing processes.
  84. If your {transform} must process a lot of historic data, it has high resource
  85. usage initially--particularly during the first checkpoint.
  86. For better performance, make sure that your search aggregations and queries are
  87. optimized and that your {transform} is processing only necessary data. Consider
  88. whether you can apply a source query to the {transform} to reduce the scope of
  89. data it processes. Also consider whether the cluster has sufficient resources in
  90. place to support both the composite aggregation search and the indexing of its
  91. results.
  92. If you prefer to spread out the impact on your cluster (at the cost of a slower
  93. {transform}), you can throttle the rate at which it performs search and index
  94. requests. Set the `docs_per_second` limit when you <<put-transform,create>> or
  95. <<update-transform,update>> your {transform}. If you want to calculate the
  96. current rate, use the following information from the
  97. {ref}/get-transform-stats.html[get {transform} stats API]:
  98. ```
  99. documents_processed / search_time_in_ms * 1000
  100. ```