Browse Source

[DOCS] Snapshot/restore: add more information about snapshot and restore monitoring

Igor Motov 11 years ago
parent
commit
56a264cf6d
1 changed files with 54 additions and 0 deletions
  1. 54 0
      docs/reference/modules/snapshots.asciidoc

+ 54 - 0
docs/reference/modules/snapshots.asciidoc

@@ -242,3 +242,57 @@ Multiple ids are also supported:
 -----------------------------------
 $ curl -XGET "localhost:9200/_snapshot/my_backup/snapshot_1,snapshot_2/_status"
 -----------------------------------
+
+[float]
+=== Monitoring snapshot/restore progress
+
+There are several ways to monitor the progress of the snapshot and restores processes while they are running. Both
+operations support `wait_for_completion` parameter that would block client until the operation is completed. This is
+the simplest method that can be used to get notified about operation completion.
+
+The snapshot operation can be also monitored by periodic calls to the snapshot info:
+
+[source,shell]
+-----------------------------------
+$ curl -XGET "localhost:9200/_snapshot/my_backup/snapshot_1"
+-----------------------------------
+
+Please note that snapshot info operation is using the same resources and thread pool as the snapshot operation. So,
+executing snapshot info operation while large shards are being snapshotted can cause the snapshot info operation to wait
+for available resources before returning the result. On very large shards the wait time can be significant.
+
+To get more immediate and complete information about snapshots the snapshot status command can be used instead:
+
+[source,shell]
+-----------------------------------
+$ curl -XGET "localhost:9200/_snapshot/my_backup/snapshot_1/_status"
+-----------------------------------
+
+While snapshot info method returns only basic information about the snapshot in progress, the snapshot status returns
+complete breakdown of the current state for each shard participating in the snapshot.
+
+The restore process piggybacks on the standard recovery mechanism of the Elasticsearch. As a result, standard recovery
+monitoring services can be used to monitor the state of restore. When restore operation is executed the cluster
+typically goes into `red` state. It happens because the restore operation starts with "recovering" primary shards of the
+restored indices. During this operation the primary shards become unavailable which manifests itself in the `red` cluster
+state. Once recovery of primary shards is completed Elasticsearch is switching to standard replication process that
+creates the required number of replicas at this moment cluster switches to the `yellow` state. Once all required replicas
+are created, the cluster switches to the `green` states.
+
+The cluster health operation provides only a high level status of the restore process. It’s possible to get more
+detailed insight into the current state of the recovery process by using <<indices-recovery, indices recovery>> and
+<<cat-recovery, cat recovery>> APIs.
+
+[float]
+=== Stopping currently running snapshot and restore operations
+
+The snapshot and restore framework allows running only one snapshot or one restore operation at time. If currently
+running snapshot was executed by mistake or takes unusually long, it can be terminated using snapshot delete operation.
+The snapshot delete operation checks if deleted snapshot is currently running and if it does, the delete operation stops
+such snapshot before deleting the snapshot data from the repository.
+
+The restore operation is using standard shard recovery mechanism. Therefore, any currently running restore operation can
+be canceled by deleting indices that are being restored. Please note that data for all deleted indices will be removed
+from the cluster as a result of this operation.
+
+