123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332 |
- [[searchable-snapshots]]
- == {search-snaps-cap}
- {search-snaps-cap} let you use <<snapshot-restore,snapshots>> to search
- infrequently accessed and read-only data in a very cost-effective fashion. The
- <<cold-tier,cold>> and <<frozen-tier,frozen>> data tiers use {search-snaps} to
- reduce your storage and operating costs.
- {search-snaps-cap} eliminate the need for <<scalability,replica shards>> after
- rolling over from the hot tier, potentially halving the local storage needed to
- search your data. {search-snaps-cap} rely on the same snapshot mechanism you
- already use for backups and have minimal impact on your snapshot repository
- storage costs.
- [discrete]
- [[using-searchable-snapshots]]
- === Using {search-snaps}
- Searching a {search-snap} index is the same as searching any other index.
- By default, {search-snap} indices have no replicas. The underlying snapshot
- provides resilience and the query volume is expected to be low enough that a
- single shard copy will be sufficient. However, if you need to support a higher
- query volume, you can add replicas by adjusting the `index.number_of_replicas`
- index setting.
- If a node fails and {search-snap} shards need to be recovered elsewhere, there
- is a brief window of time while {es} allocates the shards to other nodes where
- the cluster health will not be `green`. Searches that hit these shards may fail
- or return partial results until the shards are reallocated to healthy nodes.
- You typically manage {search-snaps} through {ilm-init}. The
- <<ilm-searchable-snapshot, searchable snapshots>> action automatically converts
- a regular index into a {search-snap} index when it reaches the `cold` or
- `frozen` phase. You can also make indices in existing snapshots searchable by
- manually mounting them using the <<searchable-snapshots-api-mount-snapshot,
- mount snapshot>> API.
- To mount an index from a snapshot that contains multiple indices, we recommend
- creating a <<clone-snapshot-api, clone>> of the snapshot that contains only the
- index you want to search, and mounting the clone. You should not delete a
- snapshot if it has any mounted indices, so creating a clone enables you to
- manage the lifecycle of the backup snapshot independently of any {search-snaps}.
- If you use {ilm-init} to manage your {search-snaps} then it will automatically
- look after cloning the snapshot as needed.
- You can control the allocation of the shards of {search-snap} indices using the
- same mechanisms as for regular indices. For example, you could use
- <<shard-allocation-filtering>> to restrict {search-snap} shards to a subset of
- your nodes.
- The speed of recovery of a {search-snap} index is limited by the repository
- setting `max_restore_bytes_per_sec` and the node setting
- `indices.recovery.max_bytes_per_sec` just like a normal restore operation. By
- default `max_restore_bytes_per_sec` is unlimited, but the default for
- `indices.recovery.max_bytes_per_sec` depends on the configuration of the node.
- See <<recovery-settings>>.
- We recommend that you <<indices-forcemerge, force-merge>> indices to a single
- segment per shard before taking a snapshot that will be mounted as a
- {search-snap} index. Each read from a snapshot repository takes time and costs
- money, and the fewer segments there are the fewer reads are needed to restore
- the snapshot or to respond to a search.
- [TIP]
- ====
- {search-snaps-cap} are ideal for managing a large archive of historical data.
- Historical information is typically searched less frequently than recent data
- and therefore may not need replicas for their performance benefits.
- For more complex or time-consuming searches, you can use <<async-search>> with
- {search-snaps}.
- ====
- [[searchable-snapshots-repository-types]]
- // tag::searchable-snapshot-repo-types[]
- Use any of the following repository types with searchable snapshots:
- * <<repository-s3,AWS S3>>
- * <<repository-gcs,Google Cloud Storage>>
- * <<repository-azure,Azure Blob Storage>>
- * {plugins}/repository-hdfs.html[Hadoop Distributed File Store (HDFS)]
- * <<snapshots-filesystem-repository,Shared filesystems>> such as NFS
- * <<snapshots-read-only-repository,Read-only HTTP and HTTPS repositories>>
- You can also use alternative implementations of these repository types, for
- instance <<repository-s3-client,MinIO>>, as long as they are fully compatible.
- Use the <<repo-analysis-api>> API to analyze your repository's suitability for
- use with searchable snapshots.
- // end::searchable-snapshot-repo-types[]
- [discrete]
- [[how-searchable-snapshots-work]]
- === How {search-snaps} work
- When an index is mounted from a snapshot, {es} allocates its shards to data
- nodes within the cluster. The data nodes then automatically retrieve the
- relevant shard data from the repository onto local storage, based on the
- <<searchable-snapshot-mount-storage-options,mount options>> specified. If
- possible, searches use data from local storage. If the data is not available
- locally, {es} downloads the data that it needs from the snapshot repository.
- If a node holding one of these shards fails, {es} automatically allocates the
- affected shards on another node, and that node restores the relevant shard data
- from the repository. No replicas are needed, and no complicated monitoring or
- orchestration is necessary to restore lost shards. Although searchable snapshot
- indices have no replicas by default, you may add replicas to these indices by
- adjusting `index.number_of_replicas`. Replicas of {search-snap} shards are
- recovered by copying data from the snapshot repository, just like primaries of
- {search-snap} shards. In contrast, replicas of regular indices are restored by
- copying data from the primary.
- [discrete]
- [[searchable-snapshot-mount-storage-options]]
- ==== Mount options
- To search a snapshot, you must first mount it locally as an index. Usually
- {ilm-init} will do this automatically, but you can also call the
- <<searchable-snapshots-api-mount-snapshot,mount snapshot>> API yourself. There
- are two options for mounting an index from a snapshot, each with different
- performance characteristics and local storage footprints:
- [[fully-mounted]]
- Fully mounted index::
- Fully caches the snapshotted index's shards in the {es} cluster. {ilm-init} uses
- this option in the `hot` and `cold` phases.
- +
- Search performance for a fully mounted index is normally comparable to a regular
- index, since there is minimal need to access the snapshot repository. While
- recovery is ongoing, search performance may be slower than with a regular index
- because a search may need some data that has not yet been retrieved into the
- local cache. If that happens, {es} will eagerly retrieve the data needed to
- complete the search in parallel with the ongoing recovery. On-disk data is
- preserved across restarts, such that the node does not need to re-download data
- that is already stored on the node after a restart.
- +
- Indices managed by {ilm-init} are prefixed with `restored-` when fully mounted.
- [[partially-mounted]]
- Partially mounted index::
- Uses a local cache containing only recently searched parts of the snapshotted
- index's data. This cache has a fixed size and is shared across shards of
- partially mounted indices allocated on the same data node. {ilm-init} uses this
- option in the `frozen` phase.
- +
- If a search requires data that is not in the cache, {es} fetches the missing
- data from the snapshot repository. Searches that require these fetches are
- slower, but the fetched data is stored in the cache so that similar searches can
- be served more quickly in future. {es} will evict infrequently used data from
- the cache to free up space. The cache is cleared when a node is restarted.
- +
- Although slower than a fully mounted index or a regular index, a partially
- mounted index still returns search results quickly, even for large data sets,
- because the layout of data in the repository is heavily optimized for search.
- Many searches will need to retrieve only a small subset of the total shard data
- before returning results.
- +
- Indices managed by {ilm-init} are prefixed with `partial-` when partially
- mounted.
- To partially mount an index, you must have one or more nodes with a shared cache
- available. By default, dedicated frozen data tier nodes (nodes with the
- `data_frozen` role and no other data roles) have a shared cache configured using
- the greater of 90% of total disk space and total disk space subtracted a
- headroom of 100GB.
- Using a dedicated frozen tier is highly recommended for production use. If you
- do not have a dedicated frozen tier, you must configure the
- `xpack.searchable.snapshot.shared_cache.size` setting to reserve space for the
- cache on one or more nodes. Partially mounted indices are only allocated to
- nodes that have a shared cache.
- [[searchable-snapshots-shared-cache]]
- `xpack.searchable.snapshot.shared_cache.size`::
- (<<static-cluster-setting,Static>>)
- Disk space reserved for the shared cache of partially mounted indices. Accepts a
- percentage of total disk space or an absolute <<byte-units,byte value>>.
- Defaults to `90%` of total disk space for dedicated frozen data tier nodes.
- Otherwise defaults to `0b`.
- `xpack.searchable.snapshot.shared_cache.size.max_headroom`::
- (<<static-cluster-setting,Static>>, <<byte-units,byte value>>)
- For dedicated frozen tier nodes, the max headroom to maintain. If
- `xpack.searchable.snapshot.shared_cache.size` is not explicitly set, this
- setting defaults to `100GB`. Otherwise it defaults to `-1` (not set). You can
- only configure this setting if `xpack.searchable.snapshot.shared_cache.size` is
- set as a percentage.
- To illustrate how these settings work in concert let us look at two examples
- when using the default values of the settings on a dedicated frozen node:
- * A 4000 GB disk will result in a shared cache sized at 3900 GB. 90% of 4000 GB
- is 3600 GB, leaving 400 GB headroom. The default `max_headroom` of 100 GB takes
- effect, and the result is therefore 3900 GB.
- * A 400 GB disk will result in a shared cache sized at 360 GB.
- You can configure the settings in `elasticsearch.yml`:
- [source,yaml]
- ----
- xpack.searchable.snapshot.shared_cache.size: 4TB
- ----
- IMPORTANT: You can only configure these settings on nodes with the
- <<data-frozen-node,`data_frozen`>> role. Additionally, nodes with a shared cache
- can only have a single <<path-settings,data path>>.
- {es} also uses a dedicated system index named `.snapshot-blob-cache` to speed up
- the recoveries of {search-snap} shards. This index is used as an additional
- caching layer on top of the partially or fully mounted data and contains the
- minimal required data to start the {search-snap} shards. {es} automatically
- deletes the documents that are no longer used in this index. This periodic clean
- up can be tuned using the following settings:
- `searchable_snapshots.blob_cache.periodic_cleanup.interval`::
- (<<dynamic-cluster-setting,Dynamic>>)
- The interval at which the periodic cleanup of the `.snapshot-blob-cache` index
- is scheduled. Defaults to every hour (`1h`).
- `searchable_snapshots.blob_cache.periodic_cleanup.retention_period`::
- (<<dynamic-cluster-setting,Dynamic>>)
- The retention period to keep obsolete documents in the `.snapshot-blob-cache`
- index. Defaults to every hour (`1h`).
- `searchable_snapshots.blob_cache.periodic_cleanup.batch_size`::
- (<<dynamic-cluster-setting,Dynamic>>)
- The number of documents that are searched for and bulk-deleted at once during
- the periodic cleanup of the `.snapshot-blob-cache` index. Defaults to `100`.
- `searchable_snapshots.blob_cache.periodic_cleanup.pit_keep_alive`::
- (<<dynamic-cluster-setting,Dynamic>>)
- The value used for the <<point-in-time-keep-alive,point-in-time keep alive>>
- requests executed during the periodic cleanup of the `.snapshot-blob-cache`
- index. Defaults to `10m`.
- [discrete]
- [[searchable-snapshots-costs]]
- === Reduce costs with {search-snaps}
- In most cases, {search-snaps} reduce the costs of running a cluster by removing
- the need for replica shards and for shard data to be copied between nodes.
- However, if it's particularly expensive to retrieve data from a snapshot
- repository in your environment, {search-snaps} may be more costly than regular
- indices. Ensure that the cost structure of your operating environment is
- compatible with {search-snaps} before using them.
- [discrete]
- [[searchable-snapshots-costs-replicas]]
- ==== Replica costs
- For resiliency, a regular index requires multiple redundant copies of each shard
- across multiple nodes. If a node fails, {es} uses the redundancy to rebuild any
- lost shard copies. A {search-snap} index doesn't require replicas. If a node
- containing a {search-snap} index fails, {es} can rebuild the lost shard cache
- from the snapshot repository.
- Without replicas, rarely-accessed {search-snap} indices require far fewer
- resources. A cold data tier that contains replica-free fully-mounted
- {search-snap} indices requires half the nodes and disk space of a tier
- containing the same data in regular indices. The frozen tier, which contains
- only partially-mounted {search-snap} indices, requires even fewer resources.
- [discrete]
- [[snapshot-retrieval-costs]]
- ==== Data transfer costs
- When a shard of a regular index is moved between nodes, its contents are copied
- from another node in your cluster. In many environments, the costs of moving
- data between nodes are significant, especially if running in a Cloud environment
- with nodes in different zones. In contrast, when mounting a {search-snap} index
- or moving one of its shards, the data is always copied from the snapshot
- repository. This is typically much cheaper.
- WARNING: Most cloud providers charge significant fees for data transferred
- between regions and for data transferred out of their platforms. You should only
- mount snapshots into a cluster that is in the same region as the snapshot
- repository. If you wish to search data across multiple regions, configure
- multiple clusters and use <<modules-cross-cluster-search,{ccs}>> or
- <<xpack-ccr,{ccr}>> instead of {search-snaps}.
- [discrete]
- [[back-up-restore-searchable-snapshots]]
- === Back up and restore {search-snaps}
- You can use <<snapshots-take-snapshot,regular snapshots>> to back up a cluster
- containing {search-snap} indices. When you restore a snapshot containing
- {search-snap} indices, these indices are restored as {search-snap} indices
- again.
- Before you restore a snapshot containing a {search-snap} index, you must first
- <<snapshots-register-repository,register the repository>> containing the
- original index snapshot. When restored, the {search-snap} index mounts the
- original index snapshot from its original repository. If wanted, you can use
- separate repositories for regular snapshots and {search-snaps}.
- A snapshot of a {search-snap} index contains only a small amount of metadata
- which identifies its original index snapshot. It does not contain any data from
- the original index. The restore of a backup will fail to restore any
- {search-snap} indices whose original index snapshot is unavailable.
- Because {search-snap} indices are not regular indices, it is not possible to use
- a <<snapshots-source-only-repository,source-only repository>> to take snapshots
- of {search-snap} indices.
- [discrete]
- [[searchable-snapshots-reliability]]
- === Reliability of {search-snaps}
- The sole copy of the data in a {search-snap} index is the underlying snapshot,
- stored in the repository. For example:
- * You cannot unregister a repository while any of the searchable snapshots it
- contains are mounted in {es}. You also cannot delete a snapshot if any of its
- indices are mounted as a searchable snapshot in the same cluster.
- * If you mount indices from snapshots held in a repository to which a different
- cluster has write access then you must make sure that the other cluster does not
- delete these snapshots.
- * If you delete a snapshot while it is mounted as a searchable snapshot then the
- data is lost. Similarly, if the repository fails or corrupts the contents of the
- snapshot then the data is lost.
- * Although {es} may have cached the data onto local storage, these caches may be
- incomplete and cannot be used to recover any data after a repository failure.
- You must make sure that your repository is reliable and protects against
- corruption of your data while it is at rest in the repository.
- The blob storage offered by all major public cloud providers typically offers
- very good protection against data loss or corruption. If you manage your own
- repository storage then you are responsible for its reliability.
|