index.asciidoc 8.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167
  1. [[searchable-snapshots]]
  2. == {search-snaps-cap}
  3. {search-snaps-cap} let you reduce your operating costs by using
  4. <<snapshot-restore, snapshots>> for resiliency rather than maintaining
  5. <<scalability,replica shards>> within a cluster. When you mount an index from a
  6. snapshot as a {search-snap}, {es} copies the index shards to local storage
  7. within the cluster. This ensures that search performance is comparable to
  8. searching any other index, and minimizes the need to access the snapshot
  9. repository. Should a node fail, shards of a {search-snap} index are
  10. automatically recovered from the snapshot repository.
  11. This can result in significant cost savings for less frequently searched data.
  12. With {search-snaps}, you no longer need an extra index shard copy to avoid data
  13. loss, potentially halving the node local storage capacity necessary for
  14. searching that data. Because {search-snaps} rely on the same snapshot mechanism
  15. you use for backups, they have a minimal impact on your snapshot repository
  16. storage costs.
  17. [discrete]
  18. [[using-searchable-snapshots]]
  19. === Using {search-snaps}
  20. Searching a {search-snap} index is the same as searching any other index.
  21. Search performance is comparable to regular indices because the shard data is
  22. copied onto nodes in the cluster when the {search-snap} is mounted.
  23. By default, {search-snap} indices have no replicas. The underlying snapshot
  24. provides resilience and the query volume is expected to be low enough that a
  25. single shard copy will be sufficient. However, if you need to support a higher
  26. query volume, you can add replicas by adjusting the `index.number_of_replicas`
  27. index setting.
  28. If a node fails and {search-snap} shards need to be restored from the snapshot,
  29. there is a brief window of time while {es} allocates the shards to other nodes
  30. where the cluster health will not be `green`. Searches that hit these shards
  31. will fail or return partial results until the shards are reallocated to healthy
  32. nodes.
  33. You typically manage {search-snaps} through {ilm-init}. The
  34. <<ilm-searchable-snapshot, searchable snapshots>> action automatically converts
  35. a regular index into a {search-snap} index when it reaches the `cold` phase.
  36. You can also make indices in existing snapshots searchable by manually mounting
  37. them as {search-snap} indices with the
  38. <<searchable-snapshots-api-mount-snapshot, mount snapshot>> API.
  39. To mount an index from a snapshot that contains multiple indices, we recommend
  40. creating a <<clone-snapshot-api, clone>> of the snapshot that contains only the
  41. index you want to search, and mounting the clone. You should not delete a
  42. snapshot if it has any mounted indices, so creating a clone enables you to
  43. manage the lifecycle of the backup snapshot independently of any
  44. {search-snaps}.
  45. You can control the allocation of the shards of {search-snap} indices using the
  46. same mechanisms as for regular indices. For example, you could use
  47. <<shard-allocation-filtering>> to restrict {search-snap} shards to a subset of
  48. your nodes.
  49. We recommend that you <<indices-forcemerge, force-merge>> indices to a single
  50. segment per shard before taking a snapshot that will be mounted as a
  51. {search-snap} index. Each read from a snapshot repository takes time and costs
  52. money, and the fewer segments there are the fewer reads are needed to restore
  53. the snapshot.
  54. [TIP]
  55. ====
  56. {search-snaps-cap} are ideal for managing a large archive of historical data.
  57. Historical information is typically searched less frequently than recent data
  58. and therefore may not need replicas for their performance benefits.
  59. For more complex or time-consuming searches, you can use <<async-search>> with
  60. {search-snaps}.
  61. ====
  62. [[searchable-snapshots-repository-types]]
  63. You can use any of the following repository types with searchable snapshots:
  64. * {plugins}/repository-s3.html[AWS S3]
  65. * {plugins}/repository-gcs.html[Google Cloud Storage]
  66. * {plugins}/repository-azure.html[Azure Blob Storage]
  67. * {plugins}/repository-hdfs.html[Hadoop Distributed File Store (HDFS)]
  68. * <<snapshots-filesystem-repository,Shared filesystems>> such as NFS
  69. You can also use alternative implementations of these repository types, for
  70. instance
  71. {plugins}/repository-s3-client.html#repository-s3-compatible-services[Minio],
  72. as long as they are fully compatible.
  73. [discrete]
  74. [[how-searchable-snapshots-work]]
  75. === How {search-snaps} work
  76. When an index is mounted from a snapshot, {es} allocates its shards to data
  77. nodes within the cluster. The data nodes then automatically restore the shard
  78. data from the repository onto local storage. Once the restore process
  79. completes, these shards respond to searches using the data held in local
  80. storage and do not need to access the repository. This avoids incurring the
  81. cost or performance penalty associated with reading data from the repository.
  82. If a node holding one of these shards fails, {es} automatically allocates it to
  83. another node, and that node restores the shard data from the repository. No
  84. replicas are needed, and no complicated monitoring or orchestration is
  85. necessary to restore lost shards.
  86. {es} restores {search-snap} shards in the background and you can search them
  87. even if they have not been fully restored. If a search hits a {search-snap}
  88. shard before it has been fully restored, {es} eagerly retrieves the data needed
  89. for the search. If a shard is freshly allocated to a node and still warming up,
  90. some searches will be slower. However, searches typically access a very small
  91. fraction of the total shard data so the performance penalty is typically small.
  92. Replicas of {search-snaps} shards are restored by copying data from the
  93. snapshot repository. In contrast, replicas of regular indices are restored by
  94. copying data from the primary.
  95. [discrete]
  96. [[back-up-restore-searchable-snapshots]]
  97. === Back up and restore {search-snaps}
  98. You can use <<snapshot-lifecycle-management,regular snapshots>> to back up a
  99. cluster containing {search-snap} indices. When you restore a snapshot
  100. containing {search-snap} indices, these indices are restored as {search-snap}
  101. indices again.
  102. Before you restore a snapshot containing a {search-snap} index, you must first
  103. <<snapshots-register-repository,register the repository>> containing the
  104. original index snapshot. When restored, the {search-snap} index mounts the
  105. original index snapshot from its original repository. If wanted, you
  106. can use separate repositories for regular snapshots and {search-snaps}.
  107. A snapshot of a {search-snap} index contains only a small amount of metadata
  108. which identifies its original index snapshot. It does not contain any data from
  109. the original index. The restore of a backup will fail to restore any
  110. {search-snap} indices whose original index snapshot is unavailable.
  111. [discrete]
  112. [[searchable-snapshots-reliability]]
  113. === Reliability of {search-snaps}
  114. The sole copy of the data in a {search-snap} index is the underlying snapshot,
  115. stored in the repository. If the repository fails or corrupts the contents of
  116. the snapshot then the data is lost. Although {es} may have made copies of the
  117. data onto local storage, these copies may be incomplete and cannot be used to
  118. recover any data after a repository failure. You must make sure that your
  119. repository is reliable and protects against corruption of your data while it is
  120. at rest in the repository.
  121. The blob storage offered by all major public cloud providers typically offers
  122. very good protection against data loss or corruption. If you manage your own
  123. repository storage then you are responsible for its reliability.
  124. [discrete]
  125. [[searchable-snapshots-shared-cache]]
  126. === Shared snapshot cache
  127. experimental::[]
  128. By default a {search-snap} copies the whole snapshot into the local cluster as
  129. described above. You can also configure a shared snapshot cache which is used
  130. to hold a copy of just the frequently-accessed parts of shards of indices which
  131. are mounted with `?storage=shared_cache`. If you configure a node to have a
  132. shared cache then that node will reserve space for the cache when it starts up.
  133. `xpack.searchable.snapshot.shared_cache.size`::
  134. (<<static-cluster-setting,Static>>, <<byte-units,byte value>>)
  135. The size of the space reserved for the shared cache. Defaults to `0b`, meaning
  136. that the node has no shared cache.