Browse Source

Snapshot process details docs improvements (#70012)

- adds a bit more overview on the process, including noting that it
works in terms of files

- notes that the snapshot is a point-in-time view of each shard, and not
necessarily exactly at the start of the snapshot process

- documents the `snapshot.max_concurrent_operations` setting

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
David Turner 4 years ago
parent
commit
3aa8ded328
1 changed files with 49 additions and 25 deletions
  1. 49 25
      docs/reference/snapshot-restore/take-snapshot.asciidoc

+ 49 - 25
docs/reference/snapshot-restore/take-snapshot.asciidoc

@@ -64,31 +64,55 @@ metadata or its other backing indices.
 [discrete]
 [[create-snapshot-process-details]]
 === Snapshot process details
-The snapshot process is incremental. In the process of making the snapshot, {es} analyses
-the list of the data stream and index files that are already stored in the repository and copies only files that were created or
-changed since the last snapshot. This process allows multiple snapshots to be preserved in the repository in a compact form.
-
-The snapshot process is executed in non-blocking fashion. All indexing and searching operations can continue to run against the data stream or index
-that is being snapshotted. However, a snapshot represents a point-in-time view
-at the moment when snapshot was created, so no records that were added to the data stream or index after the snapshot process was started
-will be included in the snapshot.
-
-The snapshot process starts immediately for the primary shards that have been started and are not relocating at the moment. {es} waits for
-relocation or initialization of shards to complete before snapshotting them.
-
-Besides creating a copy of each data stream and index, the snapshot process can also store global cluster metadata, which includes persistent
-cluster settings, templates, and data stored in system indices, such as Watches and task records, regardless of whether those system
-indices are named in the `indices` section of the request. The <<create-snapshot-api-feature-states,`feature_states` field>> can be used to
-select a subset of system indices to be included in the snapshot. The transient settings and registered snapshot repositories are not stored
-as part of the snapshot.
-
-While a snapshot of a particular shard is being
-created, this shard cannot be moved to another node, which can interfere with rebalancing and allocation
-filtering. {es} can only move a shard to another node (according to the current allocation
-filtering settings and rebalancing algorithm) after the snapshot process
-is finished.
-
-After a snapshot is created, use the <<get-snapshot-api,Get snapshot API>> to retrieve information about a snapshot. See <<snapshots-monitor-snapshot-restore,Monitor snapshot and restore progress>> to learn more about retrieving snapshot status.
+
+The snapshot process works by taking a byte-for-byte copy of the files that
+make up each index or data stream and placing these copies in the repository.
+These files are mostly written by Lucene and contain a compact representation
+of all the data in each index or data stream in a form that is designed to be
+searched efficiently. This means that when you restore an index or data stream
+from a snapshot there is no need to rebuild these search-focused data
+structures. It also means that you can use <<searchable-snapshots>> to directly
+search the data in the repository.
+
+The snapshot process is incremental: {es} compares the files that make up the
+index or data stream against the files that already exist in the repository
+and only copies files that were created or changed
+since the last snapshot. Snapshots are very space-efficient since they reuse
+any files copied to the repository by earlier snapshots.
+
+Snapshotting does not interfere with ongoing indexing or searching operations.
+A snapshot captures a view of each shard at some point in time between the
+start and end of the snapshotting process. The snapshot may not include
+documents added to a data stream or index after the snapshot process starts.
+
+You can start multiple snapshot operations at the same time. Concurrent snapshot
+operations are limited by the `snapshot.max_concurrent_operations` cluster
+setting, which defaults to `1000`. This limit applies in total to all ongoing snapshot
+creation, cloning, and deletion operations. {es} will reject any operations
+that would exceed this limit.
+
+The snapshot process starts immediately for the primary shards that have been
+started and are not relocating at the moment. {es} waits for relocation or
+initialization of shards to complete before snapshotting them.
+
+Besides creating a copy of each data stream and index, the snapshot process can
+also store global cluster metadata, which includes persistent cluster settings,
+templates, and data stored in system indices, such as Watches and task records,
+regardless of whether those system indices are named in the `indices` section
+of the request. You can also use the create snapshot
+API's <<create-snapshot-api-feature-states,`feature_states`>> parameter to
+include only a subset of system indices in the snapshot. Snapshots do not
+store transient settings or registered snapshot repositories.
+
+While a snapshot of a particular shard is being created, the shard cannot be
+moved to another node, which can interfere with rebalancing and allocation
+filtering. {es} can only move the shard to another node (according to the current
+allocation filtering settings and rebalancing algorithm) after the snapshot
+process is finished.
+
+You can use the <<get-snapshot-api,Get snapshot API>> to retrieve information
+about ongoing and completed snapshots. See
+<<snapshots-monitor-snapshot-restore,Monitor snapshot and restore progress>>.
 
 [discrete]
 [[create-snapshot-options]]