bi-directional-disaster-recovery.asciidoc 9.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275
  1. [role="xpack"]
  2. [[ccr-disaster-recovery-bi-directional-tutorial]]
  3. === Tutorial: Disaster recovery based on bi-directional {ccr}
  4. ++++
  5. <titleabbrev>Bi-directional disaster recovery</titleabbrev>
  6. ++++
  7. ////
  8. [source,console]
  9. ----
  10. PUT _data_stream/logs-generic-default
  11. ----
  12. // TESTSETUP
  13. [source,console]
  14. ----
  15. DELETE /_data_stream/*
  16. ----
  17. // TEARDOWN
  18. ////
  19. Learn how to set up disaster recovery between two clusters based on
  20. bi-directional {ccr}. The following tutorial is designed for data streams which support
  21. <<update-docs-in-a-data-stream-by-query,update by query>> and <<delete-docs-in-a-data-stream-by-query,delete by query>>. You can only perform these actions on the leader index.
  22. This tutorial works with {ls} as the source of ingestion. It takes advantage of a {ls} feature where {logstash-ref}/plugins-outputs-elasticsearch.html[the {ls} output to {es}] can be load balanced across an array of hosts specified. {beats} and {agents} currently do not
  23. support multiple outputs. It should also be possible to set up a proxy
  24. (load balancer) to redirect traffic without {ls} in this tutorial.
  25. * Setting up a remote cluster on `clusterA` and `clusterB`.
  26. * Setting up bi-directional cross-cluster replication with exclusion patterns.
  27. * Setting up {ls} with multiple hosts to allow automatic load balancing and switching during disasters.
  28. image::images/ccr-bi-directional-disaster-recovery.png[Bi-directional cross cluster replication failover and failback]
  29. [[ccr-tutorial-initial-setup]]
  30. ==== Initial setup
  31. . Set up a remote cluster on both clusters.
  32. +
  33. [source,console]
  34. ----
  35. ### On cluster A ###
  36. PUT _cluster/settings
  37. {
  38. "persistent": {
  39. "cluster": {
  40. "remote": {
  41. "clusterB": {
  42. "mode": "proxy",
  43. "skip_unavailable": true,
  44. "server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",
  45. "proxy_socket_connections": 18,
  46. "proxy_address": "clusterb.es.region-b.gcp.elastic-cloud.com:9400"
  47. }
  48. }
  49. }
  50. }
  51. }
  52. ### On cluster B ###
  53. PUT _cluster/settings
  54. {
  55. "persistent": {
  56. "cluster": {
  57. "remote": {
  58. "clusterA": {
  59. "mode": "proxy",
  60. "skip_unavailable": true,
  61. "server_name": "clustera.es.region-a.gcp.elastic-cloud.com",
  62. "proxy_socket_connections": 18,
  63. "proxy_address": "clustera.es.region-a.gcp.elastic-cloud.com:9400"
  64. }
  65. }
  66. }
  67. }
  68. }
  69. ----
  70. // TEST[setup:host]
  71. // TEST[s/"server_name": "clustera.es.region-a.gcp.elastic-cloud.com",//]
  72. // TEST[s/"server_name": "clusterb.es.region-b.gcp.elastic-cloud.com",//]
  73. // TEST[s/"proxy_socket_connections": 18,//]
  74. // TEST[s/clustera.es.region-a.gcp.elastic-cloud.com:9400/\${transport_host}/]
  75. // TEST[s/clusterb.es.region-b.gcp.elastic-cloud.com:9400/\${transport_host}/]
  76. . Set up bi-directional cross-cluster replication.
  77. +
  78. [source,console]
  79. ----
  80. ### On cluster A ###
  81. PUT /_ccr/auto_follow/logs-generic-default
  82. {
  83. "remote_cluster": "clusterB",
  84. "leader_index_patterns": [
  85. ".ds-logs-generic-default-20*"
  86. ],
  87. "leader_index_exclusion_patterns":"*-replicated_from_clustera",
  88. "follow_index_pattern": "{{leader_index}}-replicated_from_clusterb"
  89. }
  90. ### On cluster B ###
  91. PUT /_ccr/auto_follow/logs-generic-default
  92. {
  93. "remote_cluster": "clusterA",
  94. "leader_index_patterns": [
  95. ".ds-logs-generic-default-20*"
  96. ],
  97. "leader_index_exclusion_patterns":"*-replicated_from_clusterb",
  98. "follow_index_pattern": "{{leader_index}}-replicated_from_clustera"
  99. }
  100. ----
  101. // TEST[setup:remote_cluster]
  102. // TEST[s/clusterA/remote_cluster/]
  103. // TEST[s/clusterB/remote_cluster/]
  104. +
  105. IMPORTANT: Existing data on the cluster will not be replicated by
  106. `_ccr/auto_follow` even though the patterns may match. This function will only
  107. replicate newly created backing indices (as part of the data stream).
  108. +
  109. IMPORTANT: Use `leader_index_exclusion_patterns` to avoid recursion.
  110. +
  111. TIP: `follow_index_pattern` allows lowercase characters only.
  112. +
  113. TIP: This step cannot be executed via the {kib} UI due to the lack of an exclusion
  114. pattern in the UI. Use the API in this step.
  115. . Set up the {ls} configuration file.
  116. +
  117. This example uses the input generator to demonstrate the document
  118. count in the clusters. Reconfigure this section
  119. to suit your own use case.
  120. +
  121. [source,logstash]
  122. ----
  123. ### On Logstash server ###
  124. ### This is a logstash config file ###
  125. input {
  126. generator{
  127. message => 'Hello World'
  128. count => 100
  129. }
  130. }
  131. output {
  132. elasticsearch {
  133. hosts => ["https://clustera.es.region-a.gcp.elastic-cloud.com:9243","https://clusterb.es.region-b.gcp.elastic-cloud.com:9243"]
  134. user => "logstash-user"
  135. password => "same_password_for_both_clusters"
  136. }
  137. }
  138. ----
  139. +
  140. IMPORTANT: The key point is that when `cluster A` is down, all traffic will be
  141. automatically redirected to `cluster B`. Once `cluster A` comes back, traffic
  142. is automatically redirected back to `cluster A` again. This is achieved by the
  143. option `hosts` where multiple ES cluster endpoints are specified in the
  144. array `[clusterA, clusterB]`.
  145. +
  146. TIP: Set up the same password for the same user on both clusters to use this load-balancing feature.
  147. . Start {ls} with the earlier configuration file.
  148. +
  149. [source,sh]
  150. ----
  151. ### On Logstash server ###
  152. bin/logstash -f multiple_hosts.conf
  153. ----
  154. . Observe document counts in data streams.
  155. +
  156. The setup creates a data stream named `logs-generic-default` on each of the clusters. {ls} will write 50% of the documents to `cluster A` and 50% of the documents to `cluster B` when both clusters are up.
  157. +
  158. Bi-directional {ccr} will create one more data stream on each of the clusters
  159. with the `-replication_from_cluster{a|b}` suffix. At the end of this step:
  160. +
  161. * data streams on cluster A contain:
  162. ** 50 documents in `logs-generic-default-replicated_from_clusterb`
  163. ** 50 documents in `logs-generic-default`
  164. * data streams on cluster B contain:
  165. ** 50 documents in `logs-generic-default-replicated_from_clustera`
  166. ** 50 documents in `logs-generic-default`
  167. . Queries should be set up to search across both data streams.
  168. A query on `logs*`, on either of the clusters, returns 100
  169. hits in total.
  170. +
  171. [source,console]
  172. ----
  173. GET logs*/_search?size=0
  174. ----
  175. ==== Failover when `clusterA` is down
  176. . You can simulate this by shutting down either of the clusters. Let's shut down
  177. `cluster A` in this tutorial.
  178. . Start {ls} with the same configuration file. (This step is not required in real
  179. use cases where {ls} ingests continuously.)
  180. +
  181. [source,sh]
  182. ----
  183. ### On Logstash server ###
  184. bin/logstash -f multiple_hosts.conf
  185. ----
  186. . Observe all {ls} traffic will be redirected to `cluster B` automatically.
  187. +
  188. TIP: You should also redirect all search traffic to the `clusterB` cluster during this time.
  189. . The two data streams on `cluster B` now contain a different number of documents.
  190. +
  191. * data streams on cluster A (down)
  192. ** 50 documents in `logs-generic-default-replicated_from_clusterb`
  193. ** 50 documents in `logs-generic-default`
  194. * data streams On cluster B (up)
  195. ** 50 documents in `logs-generic-default-replicated_from_clustera`
  196. ** 150 documents in `logs-generic-default`
  197. ==== Failback when `clusterA` comes back
  198. . You can simulate this by turning `cluster A` back on.
  199. . Data ingested to `cluster B` during `cluster A` 's downtime will be
  200. automatically replicated.
  201. +
  202. * data streams on cluster A
  203. ** 150 documents in `logs-generic-default-replicated_from_clusterb`
  204. ** 50 documents in `logs-generic-default`
  205. * data streams on cluster B
  206. ** 50 documents in `logs-generic-default-replicated_from_clustera`
  207. ** 150 documents in `logs-generic-default`
  208. . If you have {ls} running at this time, you will also observe traffic is
  209. sent to both clusters.
  210. ==== Perform update or delete by query
  211. It is possible to update or delete the documents but you can only perform these actions on the leader index.
  212. . First identify which backing index contains the document you want to update.
  213. +
  214. [source,console]
  215. ----
  216. ### On either of the cluster ###
  217. GET logs-generic-default*/_search?filter_path=hits.hits._index
  218. {
  219. "query": {
  220. "match": {
  221. "event.sequence": "97"
  222. }
  223. }
  224. }
  225. ----
  226. +
  227. * If the hits returns `"_index": ".ds-logs-generic-default-replicated_from_clustera-<yyyy.MM.dd>-*"`, then you need to proceed to the next step on `cluster A`.
  228. * If the hits returns `"_index": ".ds-logs-generic-default-replicated_from_clusterb-<yyyy.MM.dd>-*"`, then you need to proceed to the next step on `cluster B`.
  229. * If the hits returns `"_index": ".ds-logs-generic-default-<yyyy.MM.dd>-*"`, then you need to proceed to the next step on the same cluster where you performed the search query.
  230. . Perform the update (or delete) by query:
  231. +
  232. [source,console]
  233. ----
  234. ### On the cluster identified from the previous step ###
  235. POST logs-generic-default/_update_by_query
  236. {
  237. "query": {
  238. "match": {
  239. "event.sequence": "97"
  240. }
  241. },
  242. "script": {
  243. "source": "ctx._source.event.original = params.new_event",
  244. "lang": "painless",
  245. "params": {
  246. "new_event": "FOOBAR"
  247. }
  248. }
  249. }
  250. ----
  251. +
  252. TIP: If a soft delete is merged away before it can be replicated to a follower the following process will fail due to incomplete history on the leader, see <<ccr-index-soft-deletes-retention-period, index.soft_deletes.retention_lease.period>> for more details.