123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382 |
- [[high-availability-cluster-design]]
- == Designing for resilience
- Distributed systems like {es} are designed to keep working even if some of
- their components have failed. As long as there are enough well-connected
- nodes to take over their responsibilities, an {es} cluster can continue
- operating normally if some of its nodes are unavailable or disconnected.
- There is a limit to how small a resilient cluster can be. All {es} clusters
- require:
- - One <<modules-discovery-quorums,elected master node>> node
- - At least one node for each <<modules-node,role>>.
- - At least one copy of every <<scalability,shard>>.
- A resilient cluster requires redundancy for every required cluster component.
- This means a resilient cluster must have:
- - At least three master-eligible nodes
- - At least two nodes of each role
- - At least two copies of each shard (one primary and one or more replicas,
- unless the index is a <<searchable-snapshots,searchable snapshot index>>)
- A resilient cluster needs three master-eligible nodes so that if one of
- them fails then the remaining two still form a majority and can hold a
- successful election.
- Similarly, redundancy of nodes of each role means that if a node for a
- particular role fails, another node can take on its responsibilities.
- Finally, a resilient cluster should have at least two copies of each shard. If
- one copy fails then there should be another good copy to take over. {es}
- automatically rebuilds any failed shard copies on the remaining nodes in order
- to restore the cluster to full health after a failure.
- Failures temporarily reduce the total capacity of your cluster. In addition,
- after a failure the cluster must perform additional background activities to
- restore itself to health. You should make sure that your cluster has the
- capacity to handle your workload even if some nodes fail.
- Depending on your needs and budget, an {es} cluster can consist of a single
- node, hundreds of nodes, or any number in between. When designing a smaller
- cluster, you should typically focus on making it resilient to single-node
- failures. Designers of larger clusters must also consider cases where multiple
- nodes fail at the same time. The following pages give some recommendations for
- building resilient clusters of various sizes:
- - <<high-availability-cluster-small-clusters>>
- - <<high-availability-cluster-design-large-clusters>>
- [[high-availability-cluster-small-clusters]]
- === Resilience in small clusters
- In smaller clusters, it is most important to be resilient to single-node
- failures. This section gives some guidance on making your cluster as resilient
- as possible to the failure of an individual node.
- [[high-availability-cluster-design-one-node]]
- ==== One-node clusters
- If your cluster consists of one node, that single node must do everything.
- To accommodate this, {es} assigns nodes every role by default.
- A single node cluster is not resilient. If the node fails, the cluster will
- stop working. Because there are no replicas in a one-node cluster, you cannot
- store your data redundantly. However, by default at least one replica is
- required for a <<cluster-health,`green` cluster health status>>. To ensure your
- cluster can report a `green` status, override the default by setting
- <<dynamic-index-settings,`index.number_of_replicas`>> to `0` on every index.
- If the node fails, you may need to restore an older copy of any lost indices
- from a <<modules-snapshots,snapshot>>.
- Because they are not resilient to any failures, we do not recommend using
- one-node clusters in production.
- [[high-availability-cluster-design-two-nodes]]
- ==== Two-node clusters
- If you have two nodes, we recommend they both be data nodes. You should also
- ensure every shard is stored redundantly on both nodes by setting
- <<dynamic-index-settings,`index.number_of_replicas`>> to `1` on every index
- that is not a <<searchable-snapshots,searchable snapshot index>>. This is the
- default behaviour but may be overridden by an <<index-templates,index
- template>>. <<dynamic-index-settings,Auto-expand replicas>> can also achieve
- the same thing, but it's not necessary to use this feature in such a small
- cluster.
- We recommend you set `node.master: false` on one of your two nodes so that it is
- not <<master-node,master-eligible>>. This means you can be certain which of your
- nodes is the elected master of the cluster. The cluster can tolerate the loss of
- the other master-ineligible node. If you don't set `node.master: false` on one
- node, both nodes are master-eligible. This means both nodes are required for a
- master election. Since the election will fail if either node is unavailable,
- your cluster cannot reliably tolerate the loss of either node.
- By default, each node is assigned every role. We recommend you assign both nodes
- all other roles except master eligibility. If one node fails, the other node can
- handle its tasks.
- You should avoid sending client requests to just one of your nodes. If you do
- and this node fails, such requests will not receive responses even if the
- remaining node is a healthy cluster on its own. Ideally, you should balance your
- client requests across both nodes. A good way to do this is to specify the
- addresses of both nodes when configuring the client to connect to your cluster.
- Alternatively, you can use a resilient load balancer to balance client requests
- across the nodes in your cluster.
- Because it's not resilient to failures, we do not recommend deploying a two-node
- cluster in production.
- [[high-availability-cluster-design-two-nodes-plus]]
- ==== Two-node clusters with a tiebreaker
- Because master elections are majority-based, the two-node cluster described
- above is tolerant to the loss of one of its nodes but not the
- other one. You cannot configure a two-node cluster so that it can tolerate
- the loss of _either_ node because this is theoretically impossible. You might
- expect that if either node fails then {es} can elect the remaining node as the
- master, but it is impossible to tell the difference between the failure of a
- remote node and a mere loss of connectivity between the nodes. If both nodes
- were capable of running independent elections, a loss of connectivity would
- lead to a {wikipedia}/Split-brain_(computing)[split-brain
- problem] and therefore data loss. {es} avoids this and
- protects your data by electing neither node as master until that node can be
- sure that it has the latest cluster state and that there is no other master in
- the cluster. This could result in the cluster having no master until
- connectivity is restored.
- You can solve this problem by adding a third node and making all three nodes
- master-eligible. A <<modules-discovery-quorums,master election>> requires only
- two of the three master-eligible nodes. This means the cluster can tolerate the
- loss of any single node. This third node acts as a tiebreaker in cases where the
- two original nodes are disconnected from each other. You can reduce the resource
- requirements of this extra node by making it a <<voting-only-node,dedicated
- voting-only master-eligible node>>, also known as a dedicated tiebreaker.
- Because it has no other roles, a dedicated tiebreaker does not need to be as
- powerful as the other two nodes. It will not perform any searches nor coordinate
- any client requests and cannot be elected as the master of the cluster.
- The two original nodes should not be voting-only master-eligible nodes since a
- resilient cluster requires at least three master-eligible nodes, at least two
- of which are not voting-only master-eligible nodes. If two of your three nodes
- are voting-only master-eligible nodes then the elected master must be the third
- node. This node then becomes a single point of failure.
- We recommend assigning both non-tiebreaker nodes all other roles. This creates
- redundancy by ensuring any task in the cluster can be handled by either node.
- You should not send any client requests to the dedicated tiebreaker node.
- You should also avoid sending client requests to just one of the other two
- nodes. If you do, and this node fails, then any requests will not
- receive responses, even if the remaining nodes form a healthy cluster. Ideally,
- you should balance your client requests across both of the non-tiebreaker
- nodes. You can do this by specifying the address of both nodes
- when configuring your client to connect to your cluster. Alternatively, you can
- use a resilient load balancer to balance client requests across the appropriate
- nodes in your cluster. The {ess-trial}[Elastic Cloud] service
- provides such a load balancer.
- A two-node cluster with an additional tiebreaker node is the smallest possible
- cluster that is suitable for production deployments.
- [[high-availability-cluster-design-three-nodes]]
- ==== Three-node clusters
- If you have three nodes, we recommend they all be <<data-node,data nodes>> and
- every index that is not a <<searchable-snapshots,searchable snapshot index>>
- should have at least one replica. Nodes are data nodes by default. You may
- prefer for some indices to have two replicas so that each node has a copy of
- each shard in those indices. You should also configure each node to be
- <<master-node,master-eligible>> so that any two of them can hold a master
- election without needing to communicate with the third node. Nodes are
- master-eligible by default. This cluster will be resilient to the loss of any
- single node.
- You should avoid sending client requests to just one of your nodes. If you do,
- and this node fails, then any requests will not receive responses even if the
- remaining two nodes form a healthy cluster. Ideally, you should balance your
- client requests across all three nodes. You can do this by specifying the
- address of multiple nodes when configuring your client to connect to your
- cluster. Alternatively you can use a resilient load balancer to balance client
- requests across your cluster. The {ess-trial}[Elastic Cloud]
- service provides such a load balancer.
- [[high-availability-cluster-design-three-plus-nodes]]
- ==== Clusters with more than three nodes
- Once your cluster grows to more than three nodes, you can start to specialise
- these nodes according to their responsibilities, allowing you to scale their
- resources independently as needed. You can have as many <<data-node,data
- nodes>>, <<ingest,ingest nodes>>, <<ml-node,{ml} nodes>>, etc. as needed to
- support your workload. As your cluster grows larger, we recommend using
- dedicated nodes for each role. This allows you to independently scale resources
- for each task.
- However, it is good practice to limit the number of master-eligible nodes in
- the cluster to three. Master nodes do not scale like other node types since
- the cluster always elects just one of them as the master of the cluster. If
- there are too many master-eligible nodes then master elections may take a
- longer time to complete. In larger clusters, we recommend you
- configure some of your nodes as dedicated master-eligible nodes and avoid
- sending any client requests to these dedicated nodes. Your cluster may become
- unstable if the master-eligible nodes are overwhelmed with unnecessary extra
- work that could be handled by one of the other nodes.
- You may configure one of your master-eligible nodes to be a
- <<voting-only-node,voting-only node>> so that it can never be elected as the
- master node. For instance, you may have two dedicated master nodes and a third
- node that is both a data node and a voting-only master-eligible node. This
- third voting-only node will act as a tiebreaker in master elections but will
- never become the master itself.
- [[high-availability-cluster-design-small-cluster-summary]]
- ==== Summary
- The cluster will be resilient to the loss of any node as long as:
- - The <<cluster-health,cluster health status>> is `green`.
- - There are at least two data nodes.
- - Every index that is not a <<searchable-snapshots,searchable snapshot index>>
- has at least one replica of each shard, in addition to the primary.
- - The cluster has at least three master-eligible nodes, as long as at least two
- of these nodes are not voting-only master-eligible nodes.
- - Clients are configured to send their requests to more than one node or are
- configured to use a load balancer that balances the requests across an
- appropriate set of nodes. The {ess-trial}[Elastic Cloud] service provides such
- a load balancer.
- [[high-availability-cluster-design-large-clusters]]
- === Resilience in larger clusters
- It's not unusual for nodes to share common infrastructure, such as network
- interconnects or a power supply. If so, you should plan for the failure of this
- infrastructure and ensure that such a failure would not affect too many of your
- nodes. It is common practice to group all the nodes sharing some infrastructure
- into _zones_ and to plan for the failure of any whole zone at once.
- {es} expects node-to-node connections to be reliable, have low latency, and
- have adequate bandwidth. Many {es} tasks require multiple round-trips between
- nodes. A slow or unreliable interconnect may have a significant effect on the
- performance and stability of your cluster.
- For example, a few milliseconds of latency added to each round-trip can quickly
- accumulate into a noticeable performance penalty. An unreliable network may
- have frequent network partitions. {es} will automatically recover from a
- network partition as quickly as it can but your cluster may be partly
- unavailable during a partition and will need to spend time and resources to
- resynchronize any missing data and rebalance itself once the partition heals.
- Recovering from a failure may involve copying a large amount of data between
- nodes so the recovery time is often determined by the available bandwidth.
- If you've divided your cluster into zones, the network connections within each
- zone are typically of higher quality than the connections between the zones.
- Ensure the network connections between zones are of sufficiently high quality.
- You will see the best results by locating all your zones within a single data
- center with each zone having its own independent power supply and other
- supporting infrastructure. You can also _stretch_ your cluster across nearby
- data centers as long as the network interconnection between each pair of data
- centers is good enough.
- [[high-availability-cluster-design-min-network-perf]]
- There is no specific minimum network performance required to run a healthy {es}
- cluster. In theory, a cluster will work correctly even if the round-trip
- latency between nodes is several hundred milliseconds. In practice, if your
- network is that slow then the cluster performance will be very poor. In
- addition, slow networks are often unreliable enough to cause network partitions
- that lead to periods of unavailability.
- If you want your data to be available in multiple data centers that are further
- apart or not well connected, deploy a separate cluster in each data center and
- use <<modules-cross-cluster-search,{ccs}>> or <<xpack-ccr,{ccr}>> to link the
- clusters together. These features are designed to perform well even if the
- cluster-to-cluster connections are less reliable or performant than the network
- within each cluster.
- After losing a whole zone's worth of nodes, a properly-designed cluster may be
- functional but running with significantly reduced capacity. You may need
- to provision extra nodes to restore acceptable performance in your
- cluster when handling such a failure.
- For resilience against whole-zone failures, it is important that there is a copy
- of each shard in more than one zone, which can be achieved by placing data
- nodes in multiple zones and configuring <<allocation-awareness,shard allocation
- awareness>>. You should also ensure that client requests are sent to nodes in
- more than one zone.
- You should consider all node roles and ensure that each role is split
- redundantly across two or more zones. For instance, if you are using
- <<ingest,ingest pipelines>> or {ml}, you should have ingest or {ml} nodes in two
- or more zones. However, the placement of master-eligible nodes requires a little
- more care because a resilient cluster needs at least two of the three
- master-eligible nodes in order to function. The following sections explore the
- options for placing master-eligible nodes across multiple zones.
- [[high-availability-cluster-design-two-zones]]
- ==== Two-zone clusters
- If you have two zones, you should have a different number of
- master-eligible nodes in each zone so that the zone with more nodes will
- contain a majority of them and will be able to survive the loss of the other
- zone. For instance, if you have three master-eligible nodes then you may put
- all of them in one zone or you may put two in one zone and the third in the
- other zone. You should not place an equal number of master-eligible nodes in
- each zone. If you place the same number of master-eligible nodes in each zone,
- neither zone has a majority of its own. Therefore, the cluster may not survive
- the loss of either zone.
- [[high-availability-cluster-design-two-zones-plus]]
- ==== Two-zone clusters with a tiebreaker
- The two-zone deployment described above is tolerant to the loss of one of its
- zones but not to the loss of the other one because master elections are
- majority-based. You cannot configure a two-zone cluster so that it can tolerate
- the loss of _either_ zone because this is theoretically impossible. You might
- expect that if either zone fails then {es} can elect a node from the remaining
- zone as the master but it is impossible to tell the difference between the
- failure of a remote zone and a mere loss of connectivity between the zones. If
- both zones were capable of running independent elections then a loss of
- connectivity would lead to a
- {wikipedia}/Split-brain_(computing)[split-brain problem] and
- therefore data loss. {es} avoids this and protects your data by not electing
- a node from either zone as master until that node can be sure that it has the
- latest cluster state and that there is no other master in the cluster. This may
- mean there is no master at all until connectivity is restored.
- You can solve this by placing one master-eligible node in each of your two
- zones and adding a single extra master-eligible node in an independent third
- zone. The extra master-eligible node acts as a tiebreaker in cases
- where the two original zones are disconnected from each other. The extra
- tiebreaker node should be a <<voting-only-node,dedicated voting-only
- master-eligible node>>, also known as a dedicated tiebreaker. A dedicated
- tiebreaker need not be as powerful as the other two nodes since it has no other
- roles and will not perform any searches nor coordinate any client requests nor
- be elected as the master of the cluster.
- You should use <<allocation-awareness,shard allocation awareness>> to ensure
- that there is a copy of each shard in each zone. This means either zone remains
- fully available if the other zone fails.
- All master-eligible nodes, including voting-only nodes, are on the critical path
- for publishing cluster state updates. Because of this, these nodes require
- reasonably fast persistent storage and a reliable, low-latency network
- connection to the rest of the cluster. If you add a tiebreaker node in a third
- independent zone then you must make sure it has adequate resources and good
- connectivity to the rest of the cluster.
- [[high-availability-cluster-design-three-zones]]
- ==== Clusters with three or more zones
- If you have three zones then you should have one master-eligible node in each
- zone. If you have more than three zones then you should choose three of the
- zones and put a master-eligible node in each of these three zones. This will
- mean that the cluster can still elect a master even if one of the zones fails.
- As always, your indices should have at least one replica in case a node fails,
- unless they are <<searchable-snapshots,searchable snapshot indices>>. You
- should also use <<allocation-awareness,shard allocation awareness>> to limit
- the number of copies of each shard in each zone. For instance, if you have an
- index with one or two replicas configured then allocation awareness will ensure
- that the replicas of the shard are in a different zone from the primary. This
- means that a copy of every shard will still be available if one zone fails. The
- availability of this shard will not be affected by such a failure.
- [[high-availability-cluster-design-large-cluster-summary]]
- ==== Summary
- The cluster will be resilient to the loss of any zone as long as:
- - The <<cluster-health,cluster health status>> is `green`.
- - There are at least two zones containing data nodes.
- - Every index that is not a <<searchable-snapshots,searchable snapshot index>>
- has at least one replica of each shard, in addition to the primary.
- - Shard allocation awareness is configured to avoid concentrating all copies of
- a shard within a single zone.
- - The cluster has at least three master-eligible nodes. At least two of these
- nodes are not voting-only master-eligible nodes, and they are spread evenly
- across at least three zones.
- - Clients are configured to send their requests to nodes in more than one zone
- or are configured to use a load balancer that balances the requests across an
- appropriate set of nodes. The {ess-trial}[Elastic Cloud] service provides such
- a load balancer.
|