Browse Source

[DOCS] Add searchable snapshots topic. (#63040)

* [DOCS] Add searchable snapshots topic.

* [DOCS] Add definitions & remove fully-remote storage.

* [DOCS] Fixed duplicate anchor.

* Expand conceptual docs for searchable snapshots

* Rewordings

* Glossary tidy-up

* Beta

* Reword

* More performance idea to a TIP

* use -> manage

* red -> not green

* Missing space?

* Update docs/reference/glossary.asciidoc

* Fix beta label

* Use more attributes, fix link titles

* Apply suggestions from code review

Co-authored-by: debadair <debadair@elastic.co>

* Reformat

* Minor rewordings

* More minor rewordings

* Address Henning's comments

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
debadair 5 years ago
parent
commit
b95d9c4aa4

+ 21 - 3
docs/reference/glossary.asciidoc

@@ -450,6 +450,22 @@ in the <<glossary-mapping,mapping>>.
 // end::routing-def[]
 --
 
+[[glossary-searchable-snapshot]] searchable snapshot ::
+// tag::searchable-snapshot-def[]
+A <<glossary-snapshot, snapshot>> of an index that has been mounted as a
+<<glossary-searchable-snapshot-index, searchable snapshot index>> and can be
+searched as if it were a regular index.
+// end::searchable-snapshot-def[]
+
+[[glossary-searchable-snapshot-index]] searchable snapshot index ::
+// tag::searchable-snapshot-index-def[]
+An <<glossary-index, index>> whose data is stored in a <<glossary-snapshot,
+snapshot>> that resides in a separate <<glossary-snapshot-repository,snapshot
+repository>> such as AWS S3. Searchable snapshot indices do not need
+<<glossary-replica-shard,replica>> shards for resilience, since their data is
+reliably stored outside the cluster.
+// end::searchable-snapshot-index-def[]
+
 [[glossary-shard]] shard ::
 +
 --
@@ -486,9 +502,11 @@ See the {ref}/indices-shrink-index.html[shrink index API].
 
 [[glossary-snapshot]] snapshot ::
 // tag::snapshot-def[]
-A backup taken from a running {es} cluster.
-A snapshot can include backups of an entire cluster or only data streams and
-indices you specify.
+Captures the state of the whole cluster or of particular indices or data
+streams at a particular point in time. Snapshots provide a back up of a running
+cluster, ensuring you can restore your data in the event of a failure. You can
+also mount indices or datastreams from snapshots as read-only
+{ref}/glossary.html#glossary-searchable-snapshot-index[searchable snapshots].
 // end::snapshot-def[]
 
 [[glossary-snapshot-lifecycle-policy]] snapshot lifecycle policy ::

+ 99 - 0
docs/reference/searchable-snapshots/index.asciidoc

@@ -0,0 +1,99 @@
+[[searchable-snapshots]]
+== {search-snaps-cap}
+
+beta::[]
+
+{search-snaps-cap} let you reduce your operating costs by using
+<<snapshot-restore, snapshots>> for resiliency rather than maintaining
+<<scalability,replica shards>> within a cluster. When you mount an index from a
+snapshot as a {search-snap}, {es} copies the index shards to local storage
+within the cluster. This ensures that search performance is comparable to
+searching any other index, and minimizes the need to access the snapshot
+repository. Should a node fail, shards of a {search-snap} index are
+automatically recovered from the snapshot repository.
+
+This can result in significant cost savings. With {search-snaps}, you may be
+able to halve your cluster size without increasing the risk of data loss or
+reducing the amount of data you can search. Because {search-snaps} rely on the
+same snapshot mechanism you use for backups, they have a minimal impact on your
+snapshot repository storage costs.
+
+[discrete]
+[[using-searchable-snapshots]]
+=== Using {search-snaps}
+
+Searching a {search-snap} index is the same as searching any other index.
+Search performance is comparable to regular indices because the shard data is
+copied onto nodes in the cluster when the {search-snap} is mounted.
+
+By default, {search-snap} indices have no replicas. The underlying snapshot
+provides resilience and the query volume is expected to be low enough that a
+single shard copy will be sufficient. However, if you need to support a higher
+query volume, you can add replicas by adjusting the `index.number_of_replicas`
+index setting.
+
+If a node fails and {search-snap} shards need to be restored from the snapshot,
+there is a brief window of time while {es} allocates the shards to other nodes
+where the cluster health will not be `green`. Searches that hit these shards
+will fail or return partial results until they are reallocated.
+
+You typically manage {search-snaps} through {ilm-init}. The
+<<ilm-searchable-snapshot, searchable snapshots>> action automatically converts
+an index to a {search-snap} when it reaches the `cold` phase. You can also make
+indices in existing snapshots searchable by manually mounting them as
+{search-snaps} with the <<searchable-snapshots-api-mount-snapshot, mount
+snapshot>> API.
+
+To mount an index from a snapshot that contains multiple indices, we recommend
+creating a <<clone-snapshot-api, clone>> of the snapshot that contains only the
+index you want to search, and mounting the clone. You cannot delete a snapshot
+if it has any mounted indices, so creating a clone enables you to manage the
+lifecycle of the backup snapshot independently of any {search-snaps}.
+
+You can control the allocation of the shards of {search-snap} indices using the
+same mechanisms as for regular indices. For example, you could use
+<<shard-allocation-filtering>> to restrict {search-snap} shards to a subset of
+your nodes.
+
+We recommend that you <<indices-forcemerge, force-merge>> indices to a single
+segment per shard before taking a snapshot that will be mounted as a
+{search-snap} index. Each read from a snapshot repository takes time and costs
+money, and the fewer segments there are the fewer reads are needed to restore
+the snapshot.
+
+[TIP]
+====
+{search-snaps-cap} are ideal for managing a large archive of historical data.
+Historical information is typically searched less frequently than recent data
+and therefore may not need replicas for their performance benefits.
+
+For more complex or time-consuming searches, you can use <<async-search>> with
+{search-snaps}.
+====
+
+[discrete]
+[[how-searchable-snapshots-work]]
+=== How {search-snaps} work
+
+When an index is mounted from a snapshot, {es} allocates its shards to data
+nodes within the cluster. The data nodes then automatically restore the shard
+data from the repository onto local storage. Once the restore process
+completes, these shards respond to searches using the data held in local
+storage and do not need to access the repository. This avoids incurring the
+cost or performance penalty associated with reading data from the repository.
+
+If a node holding one of these shards fails, {es} automatically allocates it to
+another node, and that node restores the shard data from the repository. No
+replicas are needed, and no complicated monitoring or orchestration is
+necessary to restore lost shards.
+
+{es} restores {search-snap} shards in the background and you can search them
+even if they have not been fully restored. If a search hits a {search-snap}
+shard before it has been fully restored, {es} eagerly retrieves the data needed
+for the search. If a shard is freshly allocated to a node and still warming up,
+some searches will be slower. However, searches typically access a very small
+fraction of the total shard data so the performance penalty is typically small.
+
+Replicas of {search-snaps} shards are restored by copying data from the
+snapshot repository. In contrast, replicas of regular indices are restored by
+copying data from the primary.

+ 2 - 0
docs/reference/snapshot-restore/index.asciidoc

@@ -112,3 +112,5 @@ include::restore-snapshot.asciidoc[]
 include::monitor-snapshot-restore.asciidoc[]
 include::delete-snapshot.asciidoc[]
 include::../slm/index.asciidoc[]
+include::../searchable-snapshots/index.asciidoc[]
+