| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182 | [role="xpack"][[transform-checkpoints]]= How {transform} checkpoints work++++<titleabbrev>How checkpoints work</titleabbrev>++++Each time a {transform} examines the source indices and creates orupdates the destination index, it generates a _checkpoint_.If your {transform} runs only once, there is logically only onecheckpoint. If your {transform} runs continuously, however, it createscheckpoints as it ingests and transforms new source data.To create a checkpoint, the {ctransform}:. Checks for changes to source indices.+Using a simple periodic timer, the {transform} checks for changes tothe source indices. This check is done based on the interval defined in thetransform's `frequency` property.+If the source indices remain unchanged or if a checkpoint is already in progressthen it waits for the next timer.. Identifies which entities have changed.+The {transform} searches to see which entities have changed since thelast time it checked. The `sync` configuration object in the {transform}identifies a time field in the source indices. The {transform} uses the valuesin that field to synchronize the source and destination indices. . Updates the destination index (the {dataframe}) with the changed entities.+--The {transform} applies changes related to either new or changedentities to the destination index. The set of changed entities is paginated. Foreach page, the {transform} performs a composite aggregation using a`terms` query. After all the pages of changes have been applied, the checkpointis complete.--This checkpoint process involves both search and indexing activity on thecluster. We have attempted to favor control over performance while developing{transforms}. We decided it was preferable for the{transform} to take longer to complete, rather than to finish quicklyand take precedence in resource consumption. That being said, the cluster stillrequires enough resources to support both the composite aggregation search andthe indexing of its results. TIP: If the cluster experiences unsuitable performance degradation due to the{transform}, stop the {transform} and refer to <<transform-performance>>.[discrete][[ml-transform-checkpoint-errors]]== Error handlingFailures in {transforms} tend to be related to searching or indexing.To increase the resiliency of {transforms}, the cursor positions ofthe aggregated search and the changed entities search are tracked in memory andpersisted periodically.Checkpoint failures can be categorized as follows:* Temporary failures: The checkpoint is retried. If 10 consecutive failuresoccur, the {transform} has a failed status. For example, thissituation might occur when there are shard failures and queries return onlypartial results.* Irrecoverable failures: The {transform} immediately fails. Forexample, this situation occurs when the source index is not found.* Adjustment failures: The {transform} retries with adjusted settings.For example, if a parent circuit breaker memory errors occur during thecomposite aggregation, the {transform} receives partial results. The aggregatedsearch is retried with a smaller number of buckets. This retry is performed atthe interval defined in the `frequency` property for the {transform}. If thesearch is retried to the point where it reaches a minimal number of buckets, anirrecoverable failure occurs.If the node running the {transforms} fails, the {transform} restartsfrom the most recent persisted cursor position. This recovery process mightrepeat some of the work the {transform} had already done, but it ensures dataconsistency.
 |