overview.asciidoc 3.3 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
  1. [role="xpack"]
  2. [[transform-overview]]
  3. === {transform-cap} overview
  4. ++++
  5. <titleabbrev>Overview</titleabbrev>
  6. ++++
  7. beta[]
  8. A _{dataframe}_ is a two-dimensional tabular data structure. In the context of
  9. the {stack}, it is a transformation of data that is indexed in {es}. For
  10. example, you can use {dataframes} to _pivot_ your data into a new entity-centric
  11. index. By transforming and summarizing your data, it becomes possible to
  12. visualize and analyze it in alternative and interesting ways.
  13. A lot of {es} indices are organized as a stream of events: each event is an
  14. individual document, for example a single item purchase. {dataframes-cap} enable
  15. you to summarize this data, bringing it into an organized, more
  16. analysis-friendly format. For example, you can summarize all the purchases of a
  17. single customer.
  18. You can create {dataframes} by using {transforms}.
  19. {transforms-cap} enable you to define a pivot, which is a set of
  20. features that transform the index into a different, more digestible format.
  21. Pivoting results in a summary of your data, which is the {dataframe}.
  22. To define a pivot, first you select one or more fields that you will use to
  23. group your data. You can select categorical fields (terms) and numerical fields
  24. for grouping. If you use numerical fields, the field values are bucketed using
  25. an interval that you specify.
  26. The second step is deciding how you want to aggregate the grouped data. When
  27. using aggregations, you practically ask questions about the index. There are
  28. different types of aggregations, each with its own purpose and output. To learn
  29. more about the supported aggregations and group-by fields, see
  30. {ref}/transform-resource.html[{transform-cap} resources].
  31. As an optional step, you can also add a query to further limit the scope of the
  32. aggregation.
  33. The {transform} performs a composite aggregation that
  34. paginates through all the data defined by the source index query. The output of
  35. the aggregation is stored in a destination index. Each time the
  36. {transform} queries the source index, it creates a _checkpoint_. You
  37. can decide whether you want the {transform} to run once (batch
  38. {transform}) or continuously ({transform}). A batch
  39. {transform} is a single operation that has a single checkpoint.
  40. {ctransforms-cap} continually increment and process checkpoints as new
  41. source data is ingested.
  42. .Example
  43. Imagine that you run a webshop that sells clothes. Every order creates a document
  44. that contains a unique order ID, the name and the category of the ordered product,
  45. its price, the ordered quantity, the exact date of the order, and some customer
  46. information (name, gender, location, etc). Your dataset contains all the transactions
  47. from last year.
  48. If you want to check the sales in the different categories in your last fiscal
  49. year, define a {transform} that groups the data by the product
  50. categories (women's shoes, men's clothing, etc.) and the order date. Use the
  51. last year as the interval for the order date. Then add a sum aggregation on the
  52. ordered quantity. The result is a {dataframe} that shows the number of sold
  53. items in every product category in the last year.
  54. [role="screenshot"]
  55. image::images/ml-dataframepivot.jpg["Example of a data frame pivot in {kib}"]
  56. IMPORTANT: The {transform} leaves your source index intact. It
  57. creates a new index that is dedicated to the {dataframe}.