getting-started-slm.asciidoc 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
  1. [role="xpack"]
  2. [testenv="basic"]
  3. [[getting-started-snapshot-lifecycle-management]]
  4. == Getting started with snapshot lifecycle management
  5. Let's get started with snapshot lifecycle management (SLM) by working through a
  6. hands-on scenario. The goal of this example is to automatically back up {es}
  7. indices using the <<modules-snapshots,snapshots>> every day at a particular
  8. time.
  9. [float]
  10. [[slm-and-security]]
  11. === Security and SLM
  12. Before starting, it's important to understand the privileges that are needed
  13. when configuring SLM if you are using the security plugin. There are two
  14. built-in cluster privileges that can be used to assist: `manage_slm` and
  15. `read_slm`. It's also good to note that the `create_snapshot` permission
  16. allows taking snapshots even for indices the role may not have access to.
  17. An example of configuring an administrator role for SLM follows:
  18. [source,js]
  19. -----------------------------------
  20. POST /_security/role/slm-admin
  21. {
  22. "cluster": ["manage_slm", "create_snapshot"],
  23. "indices": [
  24. {
  25. "names": [".slm-history-*"],
  26. "privileges": ["all"]
  27. }
  28. ]
  29. }
  30. -----------------------------------
  31. // CONSOLE
  32. // TEST[skip:security is not enabled here]
  33. Or, for a read-only role that can retrieve policies (but not update, execute, or
  34. delete them), as well as only view the history index:
  35. [source,js]
  36. -----------------------------------
  37. POST /_security/role/slm-read-only
  38. {
  39. "cluster": ["read_slm"],
  40. "indices": [
  41. {
  42. "names": [".slm-history-*"],
  43. "privileges": ["read"]
  44. }
  45. ]
  46. }
  47. -----------------------------------
  48. // CONSOLE
  49. // TEST[skip:security is not enabled here]
  50. [float]
  51. [[slm-gs-create-policy]]
  52. === Setting up a repository
  53. Before we can set up an SLM policy, we'll need to set up a
  54. <<snapshots-repositories,snapshot repository>> where the snapshots will be
  55. stored. Repositories can use {plugins}/repository.html[many different backends],
  56. including cloud storage providers. You'll probably want to use one of these in
  57. production, but for this example we'll use a shared file system repository:
  58. [source,js]
  59. -----------------------------------
  60. PUT /_snapshot/my_repository
  61. {
  62. "type": "fs",
  63. "settings": {
  64. "location": "my_backup_location"
  65. }
  66. }
  67. -----------------------------------
  68. // CONSOLE
  69. // TEST
  70. [float]
  71. === Setting up a policy
  72. Now that we have a repository in place, we can create a policy to automatically
  73. take snapshots. Policies are written in JSON and will define when to take
  74. snapshots, what the snapshots should be named, and which indices should be
  75. included, among other things. We'll use the <<slm-api-put,Put Policy>> API
  76. to create the policy.
  77. [source,js]
  78. --------------------------------------------------
  79. PUT /_slm/policy/nightly-snapshots
  80. {
  81. "schedule": "0 30 1 * * ?", <1>
  82. "name": "<nightly-snap-{now/d}>", <2>
  83. "repository": "my_repository", <3>
  84. "config": { <4>
  85. "indices": ["*"] <5>
  86. }
  87. }
  88. --------------------------------------------------
  89. // CONSOLE
  90. // TEST[continued]
  91. <1> when the snapshot should be taken, using
  92. {xpack-ref}/trigger-schedule.html#schedule-cron[Cron syntax], in this
  93. case at 1:30AM each day
  94. <2> whe name each snapshot should be given, using
  95. <<date-math-index-names,date math>> to include the current date in the name
  96. of the snapshot
  97. <3> the repository the snapshot should be stored in
  98. <4> the configuration to be used for the snapshot requests (see below)
  99. <5> which indices should be included in the snapshot, in this case, every index
  100. This policy will take a snapshot of every index each day at 1:30AM UTC.
  101. Snapshots are incremental, allowing frequent snapshots to be stored efficiently,
  102. so don't be afraid to configure a policy to take frequent snapshots.
  103. In addition to specifying the indices that should be included in the snapshot,
  104. the `config` field can be used to customize other aspects of the snapshot. You
  105. can use any option allowed in <<snapshots-take-snapshot,a regular snapshot
  106. request>>, so you can specify, for example, whether the snapshot should fail in
  107. special cases, such as if one of the specified indices cannot be found.
  108. [float]
  109. === Making sure the policy works
  110. While snapshots taken by SLM policies can be viewed through the standard snapshot
  111. API, SLM also keeps track of policy successes and failures in ways that are a bit
  112. easier to use to make sure the policy is working. Once a policy has executed at
  113. least once, when you view the policy using the <<slm-api-get,Get Policy API>>,
  114. some metadata will be returned indicating whether the snapshot was sucessfully
  115. initiated or not.
  116. Instead of waiting for our policy to run, let's tell SLM to take a snapshot
  117. as using the configuration from our policy right now instead of waiting for
  118. 1:30AM.
  119. [source,js]
  120. --------------------------------------------------
  121. PUT /_slm/policy/nightly-snapshots/_execute
  122. --------------------------------------------------
  123. // CONSOLE
  124. // TEST[skip:we can't easily handle snapshots from docs tests]
  125. This request will kick off a snapshot for our policy right now, regardless of
  126. the schedule in the policy. This is useful for taking snapshots before making
  127. a configuration change, upgrading, or for our purposes, making sure our policy
  128. is going to work successfully. The policy will continue to run on its configured
  129. schedule after this execution of the policy.
  130. [source,js]
  131. --------------------------------------------------
  132. GET /_slm/policy/nightly-snapshots?human
  133. --------------------------------------------------
  134. // CONSOLE
  135. // TEST[continued]
  136. This request will return a response that includes the policy, as well as
  137. information about the last time the policy succeeded and failed, as well as the
  138. next time the policy will be executed.
  139. [source,js]
  140. --------------------------------------------------
  141. {
  142. "nightly-snapshots" : {
  143. "version": 1,
  144. "modified_date": "2019-04-23T01:30:00.000Z",
  145. "modified_date_millis": 1556048137314,
  146. "policy" : {
  147. "schedule": "0 30 1 * * ?",
  148. "name": "<nightly-snap-{now/d}>",
  149. "repository": "my_repository",
  150. "config": {
  151. "indices": ["*"],
  152. }
  153. },
  154. "last_success": { <1>
  155. "snapshot_name": "nightly-snap-2019.04.24-tmtnyjtrsxkhbrrdcgg18a", <2>
  156. "time_string": "2019-04-24T16:43:49.316Z",
  157. "time": 1556124229316
  158. } ,
  159. "last_failure": { <3>
  160. "snapshot_name": "nightly-snap-2019.04.02-lohisb5ith2n8hxacaq3mw",
  161. "time_string": "2019-04-02T01:30:00.000Z",
  162. "time": 1556042030000,
  163. "details": "{\"type\":\"index_not_found_exception\",\"reason\":\"no such index [important]\",\"resource.type\":\"index_or_alias\",\"resource.id\":\"important\",\"index_uuid\":\"_na_\",\"index\":\"important\",\"stack_trace\":\"[important] IndexNotFoundException[no such index [important]]\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.indexNotFoundException(IndexNameExpressionResolver.java:762)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.innerResolve(IndexNameExpressionResolver.java:714)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver$WildcardExpressionResolver.resolve(IndexNameExpressionResolver.java:670)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndices(IndexNameExpressionResolver.java:163)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:142)\\n\\tat org.elasticsearch.cluster.metadata.IndexNameExpressionResolver.concreteIndexNames(IndexNameExpressionResolver.java:102)\\n\\tat org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:280)\\n\\tat org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47)\\n\\tat org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687)\\n\\tat org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310)\\n\\tat org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210)\\n\\tat org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142)\\n\\tat org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)\\n\\tat org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)\\n\\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:688)\\n\\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)\\n\\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\\n\\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\\n\\tat java.base/java.lang.Thread.run(Thread.java:834)\\n\"}"
  164. } ,
  165. "next_execution": "2019-04-24T01:30:00.000Z", <4>
  166. "next_execution_millis": 1556048160000
  167. }
  168. }
  169. --------------------------------------------------
  170. // TESTRESPONSE[skip:the presence of last_failure and last_success is asynchronous and will be present for users, but is untestable]
  171. <1> information about the last time the policy successfully initated a snapshot
  172. <2> the name of the snapshot that was successfully initiated
  173. <3> information about the last time the policy failed to initiate a snapshot
  174. <4> the is the next time the policy will execute
  175. NOTE: This metadata only indicates whether the request to initiate the snapshot was
  176. made successfully or not - after the snapshot has been successfully started, it
  177. is possible for the snapshot to fail if, for example, the connection to a remote
  178. repository is lost while copying files.
  179. If you're following along, the returned SLM policy shouldn't have a `last_failure`
  180. field - it's included above only as an example. You should, however, see a
  181. `last_success` field and a snapshot name. If you do, you've successfully taken
  182. your first snapshot using SLM!
  183. While only the most recent sucess and failure are available through the Get Policy
  184. API, all policy executions are recorded to a history index, which may be queried
  185. by searching the index pattern `.slm-history*`.
  186. That's it! We have our first SLM policy set up to periodically take snapshots
  187. so that our backups are always up to date. You can read more details in the
  188. <<snapshot-lifecycle-management-api,SLM API documentation>> and the
  189. <<modules-snapshots,general snapshot documentation.>>