Browse Source

[DOCS] Expands transforms docs with persistent tasks and related links. (#68582)

István Zoltán Szabó 4 years ago
parent
commit
05e16428fe

+ 33 - 37
docs/reference/transform/checkpoints.asciidoc

@@ -5,48 +5,46 @@
 <titleabbrev>How checkpoints work</titleabbrev>
 ++++
 
-Each time a {transform} examines the source indices and creates or
-updates the destination index, it generates a _checkpoint_.
+Each time a {transform} examines the source indices and creates or updates the 
+destination index, it generates a _checkpoint_.
 
-If your {transform} runs only once, there is logically only one
-checkpoint. If your {transform} runs continuously, however, it creates
-checkpoints as it ingests and transforms new source data.
+If your {transform} runs only once, there is logically only one checkpoint. If 
+your {transform} runs continuously, however, it creates checkpoints as it 
+ingests and transforms new source data.
 
 To create a checkpoint, the {ctransform}:
 
 . Checks for changes to source indices.
 +
-Using a simple periodic timer, the {transform} checks for changes to
-the source indices. This check is done based on the interval defined in the
-transform's `frequency` property.
+Using a simple periodic timer, the {transform} checks for changes to the source 
+indices. This check is done based on the interval defined in the transform's 
+`frequency` property.
 +
 If the source indices remain unchanged or if a checkpoint is already in progress
 then it waits for the next timer.
 
 . Identifies which entities have changed.
 +
-The {transform} searches to see which entities have changed since the
-last time it checked. The `sync` configuration object in the {transform}
-identifies a time field in the source indices. The {transform} uses the values
-in that field to synchronize the source and destination indices.
+The {transform} searches to see which entities have changed since the last time 
+it checked. The `sync` configuration object in the {transform} identifies a time 
+field in the source indices. The {transform} uses the values in that field to 
+synchronize the source and destination indices.
  
 . Updates the destination index (the {dataframe}) with the changed entities.
 +
 --
-The {transform} applies changes related to either new or changed
-entities to the destination index. The set of changed entities is paginated. For
-each page, the {transform} performs a composite aggregation using a
-`terms` query. After all the pages of changes have been applied, the checkpoint
-is complete.
+The {transform} applies changes related to either new or changed entities to the 
+destination index. The set of changed entities is paginated. For each page, the 
+{transform} performs a composite aggregation using a `terms` query. After all 
+the pages of changes have been applied, the checkpoint is complete.
 --
 
 This checkpoint process involves both search and indexing activity on the
 cluster. We have attempted to favor control over performance while developing
-{transforms}. We decided it was preferable for the
-{transform} to take longer to complete, rather than to finish quickly
-and take precedence in resource consumption. That being said, the cluster still
-requires enough resources to support both the composite aggregation search and
-the indexing of its results. 
+{transforms}. We decided it was preferable for the {transform} to take longer to 
+complete, rather than to finish quickly and take precedence in resource 
+consumption. That being said, the cluster still requires enough resources to 
+support both the composite aggregation search and the indexing of its results. 
 
 TIP: If the cluster experiences unsuitable performance degradation due to the
 {transform}, stop the {transform} and refer to <<transform-performance>>.
@@ -63,20 +61,18 @@ persisted periodically.
 Checkpoint failures can be categorized as follows:
 
 * Temporary failures: The checkpoint is retried. If 10 consecutive failures
-occur, the {transform} has a failed status. For example, this
-situation might occur when there are shard failures and queries return only
-partial results.
-* Irrecoverable failures: The {transform} immediately fails. For
-example, this situation occurs when the source index is not found.
-* Adjustment failures: The {transform} retries with adjusted settings.
-For example, if a parent circuit breaker memory errors occur during the
-composite aggregation, the {transform} receives partial results. The aggregated
-search is retried with a smaller number of buckets. This retry is performed at
-the interval defined in the `frequency` property for the {transform}. If the
-search is retried to the point where it reaches a minimal number of buckets, an
+occur, the {transform} has a failed status. For example, this situation might 
+occur when there are shard failures and queries return only partial results.
+* Irrecoverable failures: The {transform} immediately fails. For example, this 
+situation occurs when the source index is not found.
+* Adjustment failures: The {transform} retries with adjusted settings. For 
+example, if a parent circuit breaker memory errors occur during the composite 
+aggregation, the {transform} receives partial results. The aggregated search is 
+retried with a smaller number of buckets. This retry is performed at the 
+interval defined in the `frequency` property for the {transform}. If the search 
+is retried to the point where it reaches a minimal number of buckets, an 
 irrecoverable failure occurs.
 
-If the node running the {transforms} fails, the {transform} restarts
-from the most recent persisted cursor position. This recovery process might
-repeat some of the work the {transform} had already done, but it ensures data
-consistency.
+If the node running the {transforms} fails, the {transform} restarts from the 
+most recent persisted cursor position. This recovery process might repeat some 
+of the work the {transform} had already done, but it ensures data consistency.

+ 6 - 0
docs/reference/transform/overview.asciidoc

@@ -11,6 +11,12 @@ You can choose either of the following methods to transform your data:
 IMPORTANT: All {transforms} leave your source index intact. They create a new
 index that is dedicated to the transformed data.
 
+{transforms-cap} are persistent tasks; they are stored in cluster state which 
+makes them resilient for node failures. Refer to <<transform-checkpoints>> and 
+<<ml-transform-checkpoint-errors>> to learn more about the machinery behind 
+{transforms}.
+
+
 [[pivot-transform-overview]]
 == Pivot {transforms}