take-snapshot.asciidoc 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611
  1. [[snapshots-take-snapshot]]
  2. == Create a snapshot
  3. ////
  4. [source,console]
  5. ----
  6. PUT _slm/policy/nightly-snapshots
  7. {
  8. "schedule": "0 30 1 * * ?",
  9. "name": "<nightly-snap-{now/d}>",
  10. "repository": "my_repository",
  11. "config": {
  12. "indices": "*",
  13. "include_global_state": true
  14. },
  15. "retention": {
  16. "expire_after": "30d",
  17. "min_count": 5,
  18. "max_count": 50
  19. }
  20. }
  21. ----
  22. // TEST[setup:setup-repository]
  23. // TESTSETUP
  24. ////
  25. This guide shows you how to take snapshots of a running cluster. You can later
  26. <<snapshots-restore-snapshot,restore a snapshot>> to recover or transfer its
  27. data.
  28. In this guide, you’ll learn how to:
  29. * Automate snapshot creation and retention with {slm} ({slm-init})
  30. * Manually take a snapshot
  31. * Monitor a snapshot's progress
  32. * Delete or cancel a snapshot
  33. * Back up cluster configuration files
  34. The guide also provides tips for creating dedicated cluster state snapshots and
  35. taking snapshots at different time intervals.
  36. [discrete]
  37. [[create-snapshot-prereqs]]
  38. === Prerequisites
  39. include::register-repository.asciidoc[tag=kib-snapshot-prereqs]
  40. * You can only take a snapshot from a running cluster with an elected
  41. <<master-node,master node>>.
  42. * A snapshot repository must be <<snapshots-register-repository,registered>> and
  43. available to the cluster.
  44. * The cluster's global metadata must be readable. To include an index in a
  45. snapshot, the index and its metadata must also be readable. Ensure there aren't
  46. any <<cluster-read-only,cluster blocks>> or <<index-modules-blocks,index
  47. blocks>> that prevent read access.
  48. [discrete]
  49. [[create-snapshot-considerations]]
  50. === Considerations
  51. * Each snapshot must have a unique name within its repository. Attempts to
  52. create a snapshot with the same name as an existing snapshot will fail.
  53. * Snapshots are automatically deduplicated. You can take frequent snapshots with
  54. little impact to your storage overhead.
  55. * Each snapshot is logically independent. You can delete a snapshot without
  56. affecting other snapshots.
  57. * Taking a snapshot can temporarily pause shard allocations.
  58. See <<snapshots-shard-allocation>>.
  59. * Taking a snapshot doesn't block indexing or other requests. However, the
  60. snapshot won't include changes made after the snapshot process starts.
  61. * You can take multiple snapshots at the same time. The
  62. <<snapshot-max-concurrent-ops,`snapshot.max_concurrent_operations`>> cluster
  63. setting limits the maximum number of concurrent snapshot operations.
  64. * If you include a data stream in a snapshot, the snapshot also includes the
  65. stream’s backing indices and metadata.
  66. +
  67. You can also include only specific backing indices in a snapshot. However, the
  68. snapshot won't include the data stream’s metadata or its other backing indices.
  69. * A snapshot can include a data stream but exclude specific backing indices.
  70. When you restore such a data stream, it will contain only backing indices in the
  71. snapshot. If the stream’s original write index is not in the snapshot, the most
  72. recent backing index from the snapshot becomes the stream’s write index.
  73. [discrete]
  74. [[automate-snapshots-slm]]
  75. === Automate snapshots with {slm-init}
  76. {slm-cap} ({slm-init}) is the easiest way to regularly back up a cluster. An
  77. {slm-init} policy automatically takes snapshots on a preset schedule. The policy
  78. can also delete snapshots based on retention rules you define.
  79. TIP: {ess} deployments automatically include the `cloud-snapshot-policy`
  80. {slm-init} policy. {ess} uses this policy to take periodic snapshots of your
  81. cluster. For more information, see the {cloud}/ec-snapshot-restore.html[{ess}
  82. snapshot documentation].
  83. [discrete]
  84. [[slm-security]]
  85. ==== {slm-init} security
  86. The following <<privileges-list-cluster,cluster privileges>> control access to
  87. the {slm-init} actions when {es} {security-features} are enabled:
  88. `manage_slm`::
  89. Allows a user to perform all {slm-init} actions, including
  90. creating and updating policies and starting and stopping {slm-init}.
  91. `read_slm`::
  92. Allows a user to perform all read-only {slm-init} actions, such as getting
  93. policies and checking the {slm-init} status.
  94. `cluster:admin/snapshot/*`::
  95. Allows a user to take and delete snapshots of any index, whether or not they
  96. have access to that index.
  97. You can create and manage roles to assign these privileges through {kib}
  98. Management.
  99. To grant the privileges necessary to create and manage {slm-init} policies and
  100. snapshots, you can set up a role with the `manage_slm` and
  101. `cluster:admin/snapshot/*` cluster privileges and full access to the {slm-init}
  102. history indices.
  103. For example, the following request creates an `slm-admin` role:
  104. [source,console]
  105. ----
  106. POST _security/role/slm-admin
  107. {
  108. "cluster": [ "manage_slm", "cluster:admin/snapshot/*" ],
  109. "indices": [
  110. {
  111. "names": [ ".slm-history-*" ],
  112. "privileges": [ "all" ]
  113. }
  114. ]
  115. }
  116. ----
  117. // TEST[skip:security is not enabled here]
  118. To grant read-only access to {slm-init} policies and the snapshot history,
  119. you can set up a role with the `read_slm` cluster privilege and read access
  120. to the {slm} history indices.
  121. For example, the following request creates a `slm-read-only` role:
  122. [source,console]
  123. ----
  124. POST _security/role/slm-read-only
  125. {
  126. "cluster": [ "read_slm" ],
  127. "indices": [
  128. {
  129. "names": [ ".slm-history-*" ],
  130. "privileges": [ "read" ]
  131. }
  132. ]
  133. }
  134. ----
  135. // TEST[skip:security is not enabled here]
  136. [discrete]
  137. [[create-slm-policy]]
  138. ==== Create an {slm-init} policy
  139. To manage {slm-init} in {kib}, go to the main menu and click **Stack
  140. Management** > **Snapshot and Restore** > **Policies**. To create a policy,
  141. click **Create policy**.
  142. You can also manage {slm-init} using the
  143. <<snapshot-lifecycle-management-api,{slm-init} APIs>>. To create a policy, use
  144. the <<slm-api-put-policy,create {slm-init} policy API>>.
  145. The following request creates a policy that backs up the cluster state, all data
  146. streams, and all indices daily at 1:30 a.m. UTC.
  147. [source,console]
  148. ----
  149. PUT _slm/policy/nightly-snapshots
  150. {
  151. "schedule": "0 30 1 * * ?", <1>
  152. "name": "<nightly-snap-{now/d}>", <2>
  153. "repository": "my_repository", <3>
  154. "config": {
  155. "indices": "*", <4>
  156. "include_global_state": true <5>
  157. },
  158. "retention": { <6>
  159. "expire_after": "30d",
  160. "min_count": 5,
  161. "max_count": 50
  162. }
  163. }
  164. ----
  165. <1> When to take snapshots, written in <<schedule-cron,Cron syntax>>.
  166. <2> Snapshot name. Supports <<api-date-math-index-names,date math>>. To prevent
  167. naming conflicts, the policy also appends a UUID to each snapshot name.
  168. <3> <<snapshots-register-repository,Registered snapshot repository>> used to
  169. store the policy's snapshots.
  170. <4> Data streams and indices to include in the policy's snapshots. This
  171. configuration includes all data streams and indices, including system
  172. indices.
  173. <5> If `true`, the policy's snapshots include the cluster state. This also
  174. includes all feature states by default. To only include specific feature
  175. states, see <<back-up-specific-feature-state>>.
  176. <6> Optional retention rules. This configuration keeps snapshots for 30 days,
  177. retaining at least 5 and no more than 50 snapshots regardless of age. See
  178. <<slm-retention-task>> and <<snapshot-retention-limits>>.
  179. [discrete]
  180. [[manually-run-slm-policy]]
  181. ==== Manually run an {slm-init} policy
  182. You can manually run an {slm-init} policy to immediately create a snapshot. This
  183. is useful for testing a new policy or taking a snapshot before an upgrade.
  184. Manually running a policy doesn't affect its snapshot schedule.
  185. To run a policy in {kib}, go to the **Policies** page and click the run icon
  186. under the **Actions** column. You can also use the
  187. <<slm-api-execute-lifecycle,execute {slm-init} policy API>>.
  188. [source,console]
  189. ----
  190. POST _slm/policy/nightly-snapshots/_execute
  191. ----
  192. // TEST[skip:we can't easily handle snapshots from docs tests]
  193. The snapshot process runs in the background. To monitor its progress, see
  194. <<monitor-snapshot>>.
  195. [discrete]
  196. [[slm-retention-task]]
  197. ==== {slm-init} retention
  198. {slm-init} snapshot retention is a cluster-level task that runs separately from
  199. a policy's snapshot schedule. To control when the {slm-init} retention task
  200. runs, configure the <<slm-retention-schedule,`slm.retention_schedule`>> cluster
  201. setting.
  202. [source,console]
  203. ----
  204. PUT _cluster/settings
  205. {
  206. "persistent" : {
  207. "slm.retention_schedule" : "0 30 1 * * ?"
  208. }
  209. }
  210. ----
  211. To immediately run the retention task, use the
  212. <<slm-api-execute-retention,execute {slm-init} retention policy API>>.
  213. [source,console]
  214. ----
  215. POST _slm/_execute_retention
  216. ----
  217. An {slm-init} policy's retention rules only apply to snapshots created using the
  218. policy. Other snapshots don't count toward the policy's retention limits.
  219. [discrete]
  220. [[snapshot-retention-limits]]
  221. ==== Snapshot retention limits
  222. While not a hard limit, a snapshot repository shouldn't contain more than
  223. {max-snapshot-count} snapshots at a time. This ensures the repository's metadata
  224. doesn't grow to a size that may destabilize the master node. We recommend you
  225. set up your {slm-init} policy's retention rules to enforce this limit.
  226. [discrete]
  227. [[manually-create-snapshot]]
  228. === Manually create a snapshot
  229. To take a snapshot without an {slm-init} policy, use the
  230. <<create-snapshot-api,create snapshot API>>. The snapshot name supports
  231. <<api-date-math-index-names,date math>>.
  232. [source,console]
  233. ----
  234. # PUT _snapshot/my_repository/<my_snapshot_{now/d}>
  235. PUT _snapshot/my_repository/%3Cmy_snapshot_%7Bnow%2Fd%7D%3E
  236. ----
  237. // TEST[s/3E/3E?wait_for_completion=true/]
  238. Depending on its size, a snapshot can take a while to complete. By default,
  239. the create snapshot API only initiates the snapshot process, which runs in the
  240. background. To block the client until the snapshot finishes, set the
  241. `wait_for_completion` query parameter to `true`.
  242. [source,console]
  243. ----
  244. PUT _snapshot/my_repository/my_snapshot?wait_for_completion=true
  245. ----
  246. You can also clone an existing snapshot using <<clone-snapshot-api,clone
  247. snapshot API>>.
  248. [discrete]
  249. [[monitor-snapshot]]
  250. === Monitor a snapshot
  251. To monitor any currently running snapshots, use the <<get-snapshot-api,get
  252. snapshot API>> with the `_current` request path parameter.
  253. [source,console]
  254. ----
  255. GET _snapshot/my_repository/_current
  256. ----
  257. To get a complete breakdown of each shard participating in any currently running
  258. snapshots, use the <<get-snapshot-api,get snapshot status API>>.
  259. [source,console]
  260. ----
  261. GET _snapshot/_status
  262. ----
  263. [discrete]
  264. [[check-slm-history]]
  265. ==== Check {slm-init} history
  266. Use the <<slm-api-get-policy,get {slm-init} policy API>> to check when an
  267. {slm-init} policy last successfully started the snapshot process. A successful
  268. start doesn't guarantee the snapshot completed.
  269. [source,console]
  270. ----
  271. GET _slm/policy/nightly-snapshots
  272. ----
  273. To get more information about a cluster's {slm-init} execution history,
  274. including stats for each {slm-init} policy, use the <<slm-api-get-stats,get
  275. {slm-init} stats API>>. The API also returns information about the cluster's
  276. snapshot retention task history.
  277. [source,console]
  278. ----
  279. GET _slm/stats
  280. ----
  281. [discrete]
  282. [[delete-snapshot]]
  283. === Delete or cancel a snapshot
  284. To delete a snapshot in {kib}, go to the **Snapshots** page and click the trash
  285. icon under the **Actions** column. You can also use the
  286. <<delete-snapshot-api,delete snapshot API>>.
  287. [source,console]
  288. ----
  289. DELETE _snapshot/my_repository/my_snapshot_2099.05.06
  290. ----
  291. // TEST[setup:setup-snapshots]
  292. If you delete a snapshot that's in progress, {es} cancels it. The snapshot
  293. process halts and deletes any files created for the snapshot. Deleting a
  294. snapshot doesn't delete files used by other snapshots.
  295. [discrete]
  296. [[back-up-config-files]]
  297. === Back up configuration files
  298. If you run {es} on your own hardware, we recommend that, in addition to backups,
  299. you take regular backups of the files in each node's `$ES_PATH_CONF` directory
  300. using the file backup software of your choice. Snapshots don't back up these
  301. files.
  302. Depending on your setup, some of these configuration files may contain sensitive
  303. data, such as passwords or keys. If so, consider encrypting your file backups.
  304. [discrete]
  305. [[back-up-specific-feature-state]]
  306. === Back up a specific feature state
  307. By default, a snapshot that includes the cluster state also includes all
  308. <<feature-state,feature states>>. Similarly, a snapshot that excludes the
  309. cluster state excludes all feature states by default.
  310. You can also configure a snapshot to only include specific feature states,
  311. regardless of the cluster state.
  312. To get a list of available features, use the <<get-features-api,get features
  313. API>>.
  314. [source,console]
  315. ----
  316. GET _features
  317. ----
  318. The API returns:
  319. [source,console-result]
  320. ----
  321. {
  322. "features": [
  323. {
  324. "name": "tasks",
  325. "description": "Manages task results"
  326. },
  327. {
  328. "name": "kibana",
  329. "description": "Manages Kibana configuration and reports"
  330. },
  331. {
  332. "name": "security",
  333. "description": "Manages configuration for Security features, such as users and roles"
  334. },
  335. ...
  336. ]
  337. }
  338. ----
  339. // TESTRESPONSE[skip:response may vary based on features in test cluster]
  340. To include a specific feature state in a snapshot, specify the feature `name` in
  341. the `feature_states` array.
  342. For example, the following {slm-init} policy only includes feature states for
  343. the {kib} and {es} security features in its snapshots.
  344. [source,console]
  345. ----
  346. PUT _slm/policy/nightly-snapshots
  347. {
  348. "schedule": "0 30 2 * * ?",
  349. "name": "<nightly-snap-{now/d}>",
  350. "repository": "my_repository",
  351. "config": {
  352. "indices": "*",
  353. "include_global_state": true,
  354. "feature_states": [
  355. "kibana",
  356. "security"
  357. ]
  358. },
  359. "retention": {
  360. "expire_after": "30d",
  361. "min_count": 5,
  362. "max_count": 50
  363. }
  364. }
  365. ----
  366. Any index or data stream that's part of the feature state will display in a
  367. snapshot's contents. For example, if you back up the `security` feature state,
  368. the `security-*` system indices display in the <<get-snapshot-api,get snapshot
  369. API>>'s response under both `indices` and `feature_states`.
  370. [discrete]
  371. [[cluster-state-snapshots]]
  372. === Dedicated cluster state snapshots
  373. Some feature states contain sensitive data. For example, the `security` feature
  374. state includes system indices that may contain user names and encrypted password
  375. hashes.
  376. To better protect this data, consider creating a dedicated repository and
  377. {slm-init} policy for snapshots of the cluster state. This lets you strictly
  378. limit and audit access to the repository.
  379. For example, the following {slm-init} policy only backs up the cluster state.
  380. The policy stores these snapshots in a dedicated repository.
  381. [source,console]
  382. ----
  383. PUT _slm/policy/nightly-cluster-state-snapshots
  384. {
  385. "schedule": "0 30 2 * * ?",
  386. "name": "<nightly-cluster-state-snap-{now/d}>",
  387. "repository": "my_secure_repository",
  388. "config": {
  389. "include_global_state": true, <1>
  390. "indices": "-*" <2>
  391. },
  392. "retention": {
  393. "expire_after": "30d",
  394. "min_count": 5,
  395. "max_count": 50
  396. }
  397. }
  398. ----
  399. // TEST[s/my_secure_repository/my_repository/]
  400. <1> Includes the cluster state. This also includes all feature states by
  401. default.
  402. <2> Excludes regular data streams and indices.
  403. If you take dedicated snapshots of the cluster state, you'll need to exclude the
  404. cluster state and system indices from your other snapshots. For example:
  405. [source,console]
  406. ----
  407. PUT _slm/policy/nightly-snapshots
  408. {
  409. "schedule": "0 30 2 * * ?",
  410. "name": "<nightly-snap-{now/d}>",
  411. "repository": "my_repository",
  412. "config": {
  413. "include_global_state": false, <1>
  414. "indices": "*,-.*" <2>
  415. },
  416. "retention": {
  417. "expire_after": "30d",
  418. "min_count": 5,
  419. "max_count": 50
  420. }
  421. }
  422. ----
  423. <1> Excludes the cluster state. This also excludes all feature states by
  424. default.
  425. <2> Includes all data streams and indices except system indices and other
  426. indices that begin with a dot (`.`).
  427. [discrete]
  428. [[create-snapshots-different-time-intervals]]
  429. === Create snapshots at different time intervals
  430. If you only use a single {slm-init} policy, it can be difficult to take frequent
  431. snapshots and retain snapshots with longer time intervals.
  432. For example, a policy that takes snapshots every 30 minutes with a maximum of
  433. 100 snapshots will only keep snapshots for approximately two days. While this
  434. setup is great for backing up recent changes, it doesn't let you restore data
  435. from a previous week or month.
  436. To fix this, you can create multiple {slm-init} policies with the same snapshot
  437. repository that run on different schedules. Since a policy's retention rules
  438. only apply to its snapshots, a policy won't delete a snapshot created by another
  439. policy. However, you'll need to ensure the total number of snapshots in the
  440. repository doesn't exceed the <<snapshot-retention-limits,{max-snapshot-count}
  441. snapshot soft limit>>.
  442. For example, the following {slm-init} policy takes hourly snapshots with a
  443. maximum of 24 snapshots. The policy keeps its snapshots for one day.
  444. [source,console]
  445. ----
  446. PUT _slm/policy/hourly-snapshots
  447. {
  448. "name": "<hourly-snapshot-{now/d}>",
  449. "schedule": "0 0 * * * ?",
  450. "repository": "my_repository",
  451. "config": {
  452. "indices": "*",
  453. "include_global_state": true
  454. },
  455. "retention": {
  456. "expire_after": "1d",
  457. "min_count": 1,
  458. "max_count": 24
  459. }
  460. }
  461. ----
  462. The following policy takes nightly snapshots in the same snapshot repository.
  463. The policy keeps its snapshots for one month.
  464. [source,console]
  465. ----
  466. PUT _slm/policy/daily-snapshots
  467. {
  468. "name": "<daily-snapshot-{now/d}>",
  469. "schedule": "0 45 23 * * ?", <1>
  470. "repository": "my_repository",
  471. "config": {
  472. "indices": "*",
  473. "include_global_state": true
  474. },
  475. "retention": {
  476. "expire_after": "30d",
  477. "min_count": 1,
  478. "max_count": 31
  479. }
  480. }
  481. ----
  482. <1> Runs at 11:45 p.m. UTC every day.
  483. The following policy creates monthly snapshots in the same repository. The
  484. policy keeps its snapshots for one year.
  485. [source,console]
  486. ----
  487. PUT _slm/policy/monthly-snapshots
  488. {
  489. "name": "<monthly-snapshot-{now/d}>",
  490. "schedule": "0 56 23 1 * ?", <1>
  491. "repository": "my_repository",
  492. "config": {
  493. "indices": "*",
  494. "include_global_state": true
  495. },
  496. "retention": {
  497. "expire_after": "366d",
  498. "min_count": 1,
  499. "max_count": 12
  500. }
  501. }
  502. ----
  503. <1> Runs on the first of the month at 11:56 p.m. UTC.