12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788 |
- [role="xpack"]
- [[ml-transform-checkpoints]]
- == How {dataframe-transform} checkpoints work
- ++++
- <titleabbrev>How checkpoints work</titleabbrev>
- ++++
- beta[]
- Each time a {dataframe-transform} examines the source indices and creates or
- updates the destination index, it generates a _checkpoint_.
- If your {dataframe-transform} runs only once, there is logically only one
- checkpoint. If your {dataframe-transform} runs continuously, however, it creates
- checkpoints as it ingests and transforms new source data.
- To create a checkpoint, the {cdataframe-transform}:
- . Checks for changes to source indices.
- +
- Using a simple periodic timer, the {dataframe-transform} checks for changes to
- the source indices. This check is done based on the interval defined in the
- transform's `frequency` property.
- +
- If the source indices remain unchanged or if a checkpoint is already in progress
- then it waits for the next timer.
- . Identifies which entities have changed.
- +
- The {dataframe-transform} searches to see which entities have changed since the
- last time it checked. The transform's `sync` configuration object identifies a
- time field in the source indices. The transform uses the values in that field to
- synchronize the source and destination indices.
-
- . Updates the destination index (the {dataframe}) with the changed entities.
- +
- --
- The {dataframe-transform} applies changes related to either new or changed
- entities to the destination index. The set of changed entities is paginated. For
- each page, the {dataframe-transform} performs a composite aggregation using a
- `terms` query. After all the pages of changes have been applied, the checkpoint
- is complete.
- --
- This checkpoint process involves both search and indexing activity on the
- cluster. We have attempted to favor control over performance while developing
- {dataframe-transforms}. We decided it was preferable for the
- {dataframe-transform} to take longer to complete, rather than to finish quickly
- and take precedence in resource consumption. That being said, the cluster still
- requires enough resources to support both the composite aggregation search and
- the indexing of its results.
- TIP: If the cluster experiences unsuitable performance degradation due to the
- {dataframe-transform}, stop the transform. Consider whether you can apply a
- source query to the {dataframe-transform} to reduce the scope of data it
- processes. Also consider whether the cluster has sufficient resources in place
- to support both the composite aggregation search and the indexing of its
- results.
- [discrete]
- [[ml-transform-checkpoint-errors]]
- ==== Error handling
- Failures in {dataframe-transforms} tend to be related to searching or indexing.
- To increase the resiliency of {dataframe-transforms}, the cursor positions of
- the aggregated search and the changed entities search are tracked in memory and
- persisted periodically.
- Checkpoint failures can be categorized as follows:
- * Temporary failures: The checkpoint is retried. If 10 consecutive failures
- occur, the {dataframe-transform} has a failed status. For example, this
- situation might occur when there are shard failures and queries return only
- partial results.
- * Irrecoverable failures: The {dataframe-transform} immediately fails. For
- example, this situation occurs when the source index is not found.
- * Adjustment failures: The {dataframe-transform} retries with adjusted settings.
- For example, if a parent circuit breaker memory errors occur during the
- composite aggregation, the transform receives partial results. The aggregated
- search is retried with a smaller number of buckets. This retry is performed at
- the interval defined in the transform's `frequency` property. If the search
- is retried to the point where it reaches a minimal number of buckets, an
- irrecoverable failure occurs.
- If the node running the {dataframe-transforms} fails, the transform restarts
- from the most recent persisted cursor position. This recovery process might
- repeat some of the work the transform had already done, but it ensures data
- consistency.
|