checkpoints.asciidoc 3.5 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
  1. [role="xpack"]
  2. [[transform-checkpoints]]
  3. === How {transform} checkpoints work
  4. ++++
  5. <titleabbrev>How checkpoints work</titleabbrev>
  6. ++++
  7. Each time a {transform} examines the source indices and creates or
  8. updates the destination index, it generates a _checkpoint_.
  9. If your {transform} runs only once, there is logically only one
  10. checkpoint. If your {transform} runs continuously, however, it creates
  11. checkpoints as it ingests and transforms new source data.
  12. To create a checkpoint, the {ctransform}:
  13. . Checks for changes to source indices.
  14. +
  15. Using a simple periodic timer, the {transform} checks for changes to
  16. the source indices. This check is done based on the interval defined in the
  17. transform's `frequency` property.
  18. +
  19. If the source indices remain unchanged or if a checkpoint is already in progress
  20. then it waits for the next timer.
  21. . Identifies which entities have changed.
  22. +
  23. The {transform} searches to see which entities have changed since the
  24. last time it checked. The `sync` configuration object in the {transform}
  25. identifies a time field in the source indices. The {transform} uses the values
  26. in that field to synchronize the source and destination indices.
  27. . Updates the destination index (the {dataframe}) with the changed entities.
  28. +
  29. --
  30. The {transform} applies changes related to either new or changed
  31. entities to the destination index. The set of changed entities is paginated. For
  32. each page, the {transform} performs a composite aggregation using a
  33. `terms` query. After all the pages of changes have been applied, the checkpoint
  34. is complete.
  35. --
  36. This checkpoint process involves both search and indexing activity on the
  37. cluster. We have attempted to favor control over performance while developing
  38. {transforms}. We decided it was preferable for the
  39. {transform} to take longer to complete, rather than to finish quickly
  40. and take precedence in resource consumption. That being said, the cluster still
  41. requires enough resources to support both the composite aggregation search and
  42. the indexing of its results.
  43. TIP: If the cluster experiences unsuitable performance degradation due to the
  44. {transform}, stop the {transform} and refer to <<transform-performance>>.
  45. [discrete]
  46. [[ml-transform-checkpoint-errors]]
  47. ==== Error handling
  48. Failures in {transforms} tend to be related to searching or indexing.
  49. To increase the resiliency of {transforms}, the cursor positions of
  50. the aggregated search and the changed entities search are tracked in memory and
  51. persisted periodically.
  52. Checkpoint failures can be categorized as follows:
  53. * Temporary failures: The checkpoint is retried. If 10 consecutive failures
  54. occur, the {transform} has a failed status. For example, this
  55. situation might occur when there are shard failures and queries return only
  56. partial results.
  57. * Irrecoverable failures: The {transform} immediately fails. For
  58. example, this situation occurs when the source index is not found.
  59. * Adjustment failures: The {transform} retries with adjusted settings.
  60. For example, if a parent circuit breaker memory errors occur during the
  61. composite aggregation, the {transform} receives partial results. The aggregated
  62. search is retried with a smaller number of buckets. This retry is performed at
  63. the interval defined in the `frequency` property for the {transform}. If the
  64. search is retried to the point where it reaches a minimal number of buckets, an
  65. irrecoverable failure occurs.
  66. If the node running the {transforms} fails, the {transform} restarts
  67. from the most recent persisted cursor position. This recovery process might
  68. repeat some of the work the {transform} had already done, but it ensures data
  69. consistency.