checkpoints.asciidoc 3.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
  1. [role="xpack"]
  2. [[transform-checkpoints]]
  3. = How {transform} checkpoints work
  4. ++++
  5. <titleabbrev>How checkpoints work</titleabbrev>
  6. ++++
  7. Each time a {transform} examines the source indices and creates or updates the
  8. destination index, it generates a _checkpoint_.
  9. If your {transform} runs only once, there is logically only one checkpoint. If
  10. your {transform} runs continuously, however, it creates checkpoints as it
  11. ingests and transforms new source data.
  12. To create a checkpoint, the {ctransform}:
  13. . Checks for changes to source indices.
  14. +
  15. Using a simple periodic timer, the {transform} checks for changes to the source
  16. indices. This check is done based on the interval defined in the transform's
  17. `frequency` property.
  18. +
  19. If the source indices remain unchanged or if a checkpoint is already in progress
  20. then it waits for the next timer.
  21. . Identifies which entities have changed.
  22. +
  23. The {transform} searches to see which entities have changed since the last time
  24. it checked. The `sync` configuration object in the {transform} identifies a time
  25. field in the source indices. The {transform} uses the values in that field to
  26. synchronize the source and destination indices.
  27. . Updates the destination index (the {dataframe}) with the changed entities.
  28. +
  29. --
  30. The {transform} applies changes related to either new or changed entities to the
  31. destination index. The set of changed entities is paginated. For each page, the
  32. {transform} performs a composite aggregation using a `terms` query. After all
  33. the pages of changes have been applied, the checkpoint is complete.
  34. --
  35. This checkpoint process involves both search and indexing activity on the
  36. cluster. We have attempted to favor control over performance while developing
  37. {transforms}. We decided it was preferable for the {transform} to take longer to
  38. complete, rather than to finish quickly and take precedence in resource
  39. consumption. That being said, the cluster still requires enough resources to
  40. support both the composite aggregation search and the indexing of its results.
  41. TIP: If the cluster experiences unsuitable performance degradation due to the
  42. {transform}, stop the {transform} and refer to <<transform-performance>>.
  43. [discrete]
  44. [[ml-transform-checkpoint-errors]]
  45. == Error handling
  46. Failures in {transforms} tend to be related to searching or indexing.
  47. To increase the resiliency of {transforms}, the cursor positions of
  48. the aggregated search and the changed entities search are tracked in memory and
  49. persisted periodically.
  50. Checkpoint failures can be categorized as follows:
  51. * Temporary failures: The checkpoint is retried. If 10 consecutive failures
  52. occur, the {transform} has a failed status. For example, this situation might
  53. occur when there are shard failures and queries return only partial results.
  54. * Irrecoverable failures: The {transform} immediately fails. For example, this
  55. situation occurs when the source index is not found.
  56. * Adjustment failures: The {transform} retries with adjusted settings. For
  57. example, if a parent circuit breaker memory errors occur during the composite
  58. aggregation, the {transform} receives partial results. The aggregated search is
  59. retried with a smaller number of buckets. This retry is performed at the
  60. interval defined in the `frequency` property for the {transform}. If the search
  61. is retried to the point where it reaches a minimal number of buckets, an
  62. irrecoverable failure occurs.
  63. If the node running the {transforms} fails, the {transform} restarts from the
  64. most recent persisted cursor position. This recovery process might repeat some
  65. of the work the {transform} had already done, but it ensures data consistency.