123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316 |
- [role="xpack"]
- [[xpack-ccr]]
- == {ccr-cap}
- With {ccr}, you can replicate indices across clusters to:
- * Continue handling search requests in the event of a datacenter outage
- * Prevent search volume from impacting indexing throughput
- * Reduce search latency by processing search requests in geo-proximity to the
- user
- {ccr-cap} uses an active-passive model. You index to a _leader_ index, and the
- data is replicated to one or more read-only _follower_ indices. Before you can add a follower index to a cluster, you must configure the _remote cluster_ that contains the leader index.
- When the leader index receives writes, the follower indices pull changes from
- the leader index on the remote cluster. You can manually create follower
- indices, or configure auto-follow patterns to automatically create follower
- indices for new time series indices.
- You configure {ccr} clusters in a uni-directional or bi-directional setup:
- * In a uni-directional configuration, one cluster contains only
- leader indices, and the other cluster contains only follower indices.
- * In a bi-directional configuration, each cluster contains both leader and
- follower indices.
- In a uni-directional configuration, the cluster containing follower indices
- must be running **the same or newer** version of {es} as the remote cluster.
- If newer, the versions must also be compatible as outlined in the following matrix.
- [%collapsible]
- [[ccr-version-compatibility]]
- .Version compatibility matrix
- ====
- include::../modules/remote-clusters-shared.asciidoc[tag=remote-cluster-compatibility-matrix]
- ====
- [discrete]
- [[ccr-multi-cluster-architectures]]
- === Multi-cluster architectures
- Use {ccr} to construct several multi-cluster architectures within the Elastic
- Stack:
- * <<ccr-disaster-recovery,Disaster recovery>> in case a primary cluster fails,
- with a secondary cluster serving as a hot backup
- * <<ccr-data-locality,Data locality>> to maintain multiple copies of the
- dataset close to the application servers (and users), and reduce costly latency
- * <<ccr-centralized-reporting,Centralized reporting>> for minimizing network
- traffic and latency in querying multiple geo-distributed {es} clusters, or for
- preventing search load from interfering with indexing by offloading search to a
- secondary cluster
- Watch the
- https://www.elastic.co/webinars/replicate-elasticsearch-data-with-cross-cluster-replication-ccr[{ccr} webinar] to learn more about the following use cases.
- Then, <<ccr-getting-started-tutorial,set up {ccr}>> on your local machine and work
- through the demo from the webinar.
- [discrete]
- [[ccr-disaster-recovery]]
- ==== Disaster recovery and high availability
- Disaster recovery provides your mission-critical applications with the
- tolerance to withstand datacenter or region outages. This use case is the
- most common deployment of {ccr}. You can configure clusters in different
- architectures to support disaster recovery and high availability:
- * <<ccr-single-datacenter-recovery>>
- * <<ccr-multiple-datacenter-recovery>>
- * <<ccr-chained-replication>>
- * <<ccr-bi-directional-replication>>
- [discrete]
- [[ccr-single-datacenter-recovery]]
- ===== Single disaster recovery datacenter
- In this configuration, data is replicated from the production datacenter to the
- disaster recovery datacenter. Because the follower indices replicate the leader
- index, your application can use the disaster recovery datacenter if the
- production datacenter is unavailable.
- image::images/ccr-arch-disaster-recovery.png[Production datacenter that replicates data to a disaster recovery datacenter]
- [discrete]
- [[ccr-multiple-datacenter-recovery]]
- ===== Multiple disaster recovery datacenters
- You can replicate data from one datacenter to multiple datacenters. This
- configuration provides both disaster recovery and high availability, ensuring
- that data is replicated in two datacenters if the primary datacenter is down
- or unavailable.
- In the following diagram, data from Datacenter A is replicated to
- Datacenter B and Datacenter C, which both have a read-only copy of the leader
- index from Datacenter A.
- image::images/ccr-arch-multiple-dcs.png[Production datacenter that replicates data to two other datacenters]
- [discrete]
- [[ccr-chained-replication]]
- ===== Chained replication
- You can replicate data across multiple datacenters to form a replication
- chain. In the following diagram, Datacenter A contains the leader index.
- Datacenter B replicates data from Datacenter A, and Datacenter C replicates
- from the follower indices in Datacenter B. The connection between these
- datacenters forms a chained replication pattern.
- image::images/ccr-arch-chain-dcs.png[Three datacenters connected to form a replication chain]
- [discrete]
- [[ccr-bi-directional-replication]]
- ===== Bi-directional replication
- In a https://www.elastic.co/blog/bi-directional-replication-with-elasticsearch-cross-cluster-replication-ccr[bi-directional replication] setup, all clusters have access to view
- all data, and all clusters have an index to write to without manually
- implementing failover. Applications can write to the local index within each
- datacenter, and read across multiple indices for a global view of all
- information.
- This configuration requires no manual intervention when a cluster or datacenter
- is unavailable. In the following diagram, if Datacenter A is unavailable, you can continue using Datacenter B without manual failover. When Datacenter A
- comes online, replication resumes between the clusters.
- image::images/ccr-arch-bi-directional.png[Bi-directional configuration where each cluster contains both a leader index and follower indices]
- This configuration is particularly useful for index-only workloads, where no updates
- to document values occur. In this configuration, documents indexed by {es} are
- immutable. Clients are located in each datacenter alongside the {es}
- cluster, and do not communicate with clusters in different datacenters.
- [discrete]
- [[ccr-data-locality]]
- ==== Data locality
- Bringing data closer to your users or application server can reduce latency
- and response time. This methodology also applies when replicating data in {es}.
- For example, you can replicate a product catalog or reference dataset to 20 or
- more datacenters around the world to minimize the distance between the data and
- the application server.
- In the following diagram, data is replicated from one datacenter to three
- additional datacenters, each in their own region. The central datacenter
- contains the leader index, and the additional datacenters contain follower
- indices that replicate data in that particular region. This configuration
- puts data closer to the application accessing it.
- image::images/ccr-arch-data-locality.png[A centralized datacenter replicated across three other datacenters, each in their own region]
- [discrete]
- [[ccr-centralized-reporting]]
- ==== Centralized reporting
- Using a centralized reporting cluster is useful when querying across a large
- network is inefficient. In this configuration, you replicate data from many
- smaller clusters to the centralized reporting cluster.
- For example, a large global bank might have 100 {es} clusters around the world
- that are distributed across different regions for each bank branch. Using
- {ccr}, the bank can replicate events from all 100 banks to a central cluster to
- analyze and aggregate events locally for reporting. Rather than maintaining a
- mirrored cluster, the bank can use {ccr} to replicate specific indices.
- In the following diagram, data from three datacenters in different regions is
- replicated to a centralized reporting cluster. This configuration enables you
- to copy data from regional hubs to a central cluster, where you can run all
- reports locally.
- image::images/ccr-arch-central-reporting.png[Three clusters in different regions sending data to a centralized reporting cluster for analysis]
- [discrete]
- [[ccr-replication-mechanics]]
- === Replication mechanics
- Although you <<ccr-getting-started-tutorial,set up {ccr}>> at the index level, {es}
- achieves replication at the shard level. When a follower index is created,
- each shard in that index pulls changes from its corresponding shard in the
- leader index, which means that a follower index has the same number of
- shards as its leader index. All operations on the leader are replicated by the
- follower, such as operations to create, update, or delete a document.
- These requests can be served from any copy of the leader shard (primary or
- replica).
- When a follower shard sends a read request, the leader shard responds with
- any new operations, limited by the read parameters that you establish when
- configuring the follower index. If no new operations are available, the
- leader shard waits up to the configured timeout for new operations. If the
- timeout elapses, the leader shard responds to the follower shard that there
- are no new operations. The follower shard updates shard statistics and
- immediately sends another read request to the leader shard. This
- communication model ensures that network connections between the remote
- cluster and the local cluster are continually in use, avoiding forceful
- termination by an external source such as a firewall.
- If a read request fails, the cause of the failure is inspected. If the
- cause of the failure is deemed to be recoverable (such as a network
- failure), the follower shard enters into a retry loop. Otherwise, the
- follower shard pauses
- <<ccr-pause-replication,until you resume it>>.
- [discrete]
- [[ccr-update-leader-index]]
- ==== Processing updates
- You can't manually modify a follower index's mappings or aliases. To make
- changes, you must update the leader index. Because they are read-only, follower
- indices reject writes in all configurations.
- For example, you index a document named `doc_1` in Datacenter A, which
- replicates to Datacenter B. If a client connects to Datacenter B and attempts
- to update `doc_1`, the request fails. To update `doc_1`, the client must
- connect to Datacenter A and update the document in the leader index.
- When a follower shard receives operations from the leader shard, it places
- those operations in a write buffer. The follower shard submits bulk write
- requests using operations from the write buffer. If the write buffer exceeds
- its configured limits, no additional read requests are sent. This configuration
- provides a back-pressure against read requests, allowing the follower shard
- to resume sending read requests when the write buffer is no longer full.
- To manage how operations are replicated from the leader index, you can
- configure settings when
- <<ccr-getting-started-follower-index,creating the follower index>>.
- The follower index automatically retrieves some updates applied to the leader
- index, while other updates are retrieved as needed:
- [cols="3"]
- |===
- h| Update type h| Automatic h| As needed
- | Alias | {yes-icon} | {no-icon}
- | Mapping | {no-icon} | {yes-icon}
- | Settings | {no-icon} | {yes-icon}
- |===
- For example, changing the number of replicas on the leader index is not
- replicated by the follower index, so that setting might not be retrieved.
- If you apply a non-dynamic settings change to the leader index that is
- needed by the follower index, the follower index closes itself, applies the
- settings update, and then re-opens itself. The follower index is unavailable
- for reads and cannot replicate writes during this cycle.
- [discrete]
- [[ccr-remote-recovery]]
- === Initializing followers using remote recovery
- When you create a follower index, you cannot use it until it is fully
- initialized. The _remote recovery_ process builds a new copy of a shard on a
- follower node by copying data from the primary shard in the leader cluster.
- {es} uses this remote recovery process to bootstrap a follower index using the
- data from the leader index. This process provides the follower with a copy of
- the current state of the leader index, even if a complete history of changes
- is not available on the leader due to Lucene segment merging.
- Remote recovery is a network intensive process that transfers all of the Lucene
- segment files from the leader cluster to the follower cluster. The follower
- requests that a recovery session be initiated on the primary shard in the
- leader cluster. The follower then requests file chunks concurrently from the
- leader. By default, the process concurrently requests five 1MB file
- chunks. This default behavior is designed to support leader and follower
- clusters with high network latency between them.
- TIP: You can modify dynamic <<ccr-recovery-settings,remote recovery settings>>
- to rate-limit the transmitted data and manage the resources consumed by remote
- recoveries.
- Use the <<cat-recovery,recovery API>> on the cluster containing the follower
- index to obtain information about an in-progress remote recovery. Because {es}
- implements remote recoveries using the
- <<snapshot-restore,snapshot and restore>> infrastructure, running remote
- recoveries are labelled as type `snapshot` in the recovery API.
- [discrete]
- [[ccr-leader-requirements]]
- === Replicating a leader requires soft deletes
- {ccr-cap} works by replaying the history of individual write
- operations that were performed on the shards of the leader index. {es} needs to
- retain the
- <<index-modules-history-retention,history of these operations>> on the leader
- shards so that they can be pulled by the follower shard tasks. The underlying
- mechanism used to retain these operations is _soft deletes_.
- A soft delete occurs whenever an existing document is deleted or updated. By
- retaining these soft deletes up to configurable limits, the history of
- operations can be retained on the leader shards and made available to the
- follower shard tasks as it replays the history of operations.
- The <<ccr-index-soft-deletes-retention-period,`index.soft_deletes.retention_lease.period`>>
- setting defines the maximum time to retain a shard history retention lease
- before it is considered expired. This setting determines how long the cluster
- containing your follower index can be offline, which is 12 hours by default. If
- a shard copy recovers after its retention lease expires, but the missing
- operations are still available on the leader index, then {es} will establish a
- new lease and copy the missing operations. However {es} does not guarantee to
- retain unleased operations, so it is also possible that some of the missing
- operations have been discarded by the leader and are now completely
- unavailable. If this happens then the follower cannot recover automatically so
- you must <<ccr-recreate-follower-index,recreate it>>.
- Soft deletes must be enabled for indices that you want to use as leader
- indices. Soft deletes are enabled by default on new indices created on
- or after {es} 7.0.0.
- // tag::ccr-existing-indices-tag[]
- IMPORTANT: {ccr-cap} cannot be used on existing indices created using {es}
- 7.0.0 or earlier, where soft deletes are disabled. You must
- <<docs-reindex,reindex>> your data into a new index with soft deletes
- enabled.
- // end::ccr-existing-indices-tag[]
- [discrete]
- [[ccr-learn-more]]
- === Use {ccr}
- This following sections provide more information about how to configure
- and use {ccr}:
- * <<ccr-getting-started-tutorial>>
- * <<ccr-managing>>
- * <<ccr-auto-follow>>
- * <<ccr-upgrading>>
- include::getting-started.asciidoc[]
- include::managing.asciidoc[]
- include::auto-follow.asciidoc[]
- include::upgrading.asciidoc[]
|