| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382 | [[high-availability-cluster-design]]== Designing for resilienceDistributed systems like {es} are designed to keep working even if some oftheir components have failed. As long as there are enough well-connectednodes to take over their responsibilities, an {es} cluster can continueoperating normally if some of its nodes are unavailable or disconnected.There is a limit to how small a resilient cluster can be. All {es} clustersrequire:- One <<modules-discovery-quorums,elected master node>> node- At least one node for each <<modules-node,role>>.- At least one copy of every <<scalability,shard>>.A resilient cluster requires redundancy for every required cluster component.This means a resilient cluster must have:- At least three master-eligible nodes- At least two nodes of each role- At least two copies of each shard (one primary and one or more replicas,  unless the index is a <<searchable-snapshots,searchable snapshot index>>)A resilient cluster needs three master-eligible nodes so that if one ofthem fails then the remaining two still form a majority and can hold asuccessful election.Similarly, redundancy of nodes of each role means that if a node for aparticular role fails, another node can take on its responsibilities.Finally, a resilient cluster should have at least two copies of each shard. Ifone copy fails then there should be another good copy to take over. {es}automatically rebuilds any failed shard copies on the remaining nodes in orderto restore the cluster to full health after a failure.Failures temporarily reduce the total capacity of your cluster. In addition,after a failure the cluster must perform additional background activities torestore itself to health. You should make sure that your cluster has thecapacity to handle your workload even if some nodes fail.Depending on your needs and budget, an {es} cluster can consist of a singlenode, hundreds of nodes, or any number in between. When designing a smallercluster, you should typically focus on making it resilient to single-nodefailures. Designers of larger clusters must also consider cases where multiplenodes fail at the same time. The following pages give some recommendations forbuilding resilient clusters of various sizes:- <<high-availability-cluster-small-clusters>>- <<high-availability-cluster-design-large-clusters>>[[high-availability-cluster-small-clusters]]=== Resilience in small clustersIn smaller clusters, it is most important to be resilient to single-nodefailures. This section gives some guidance on making your cluster as resilientas possible to the failure of an individual node.[[high-availability-cluster-design-one-node]]==== One-node clustersIf your cluster consists of one node, that single node must do everything.To accommodate this, {es} assigns nodes every role by default.A single node cluster is not resilient. If the node fails, the cluster willstop working. Because there are no replicas in a one-node cluster, you cannotstore your data redundantly. However, by default at least one replica isrequired for a <<cluster-health,`green` cluster health status>>. To ensure yourcluster can report a `green` status, override the default by setting<<dynamic-index-settings,`index.number_of_replicas`>> to `0` on every index.If the node fails, you may need to restore an older copy of any lost indicesfrom a <<modules-snapshots,snapshot>>.Because they are not resilient to any failures, we do not recommend usingone-node clusters in production.[[high-availability-cluster-design-two-nodes]]==== Two-node clustersIf you have two nodes, we recommend they both be data nodes. You should alsoensure every shard is stored redundantly on both nodes by setting<<dynamic-index-settings,`index.number_of_replicas`>> to `1` on every indexthat is not a <<searchable-snapshots,searchable snapshot index>>. This is thedefault behaviour but may be overridden by an <<index-templates,indextemplate>>. <<dynamic-index-settings,Auto-expand replicas>> can also achievethe same thing, but it's not necessary to use this feature in such a smallcluster.We recommend you set `node.master: false` on one of your two nodes so that it isnot <<master-node,master-eligible>>. This means you can be certain which of yournodes is the elected master of the cluster. The cluster can tolerate the loss ofthe other master-ineligible node. If you don't set `node.master: false` on onenode, both nodes are master-eligible. This means both nodes are required for amaster election. Since the election will fail if either node is unavailable,your cluster cannot reliably tolerate the loss of either node.By default, each node is assigned every role. We recommend you assign both nodesall other roles except master eligibility. If one node fails, the other node canhandle its tasks.You should avoid sending client requests to just one of your nodes. If you doand this node fails, such requests will not receive responses even if theremaining node is a healthy cluster on its own. Ideally, you should balance yourclient requests across both nodes. A good way to do this is to specify theaddresses of both nodes when configuring the client to connect to your cluster.Alternatively, you can use a resilient load balancer to balance client requestsacross the nodes in your cluster.Because it's not resilient to failures, we do not recommend deploying a two-nodecluster in production.[[high-availability-cluster-design-two-nodes-plus]]==== Two-node clusters with a tiebreakerBecause master elections are majority-based, the two-node cluster describedabove is tolerant to the loss of one of its nodes but not theother one. You cannot configure a two-node cluster so that it can toleratethe loss of _either_ node because this is theoretically impossible. You mightexpect that if either node fails then {es} can elect the remaining node as themaster, but it is impossible to tell the difference between the failure of aremote node and a mere loss of connectivity between the nodes. If both nodeswere capable of running independent elections, a loss of connectivity wouldlead to a {wikipedia}/Split-brain_(computing)[split-brainproblem] and therefore data loss. {es} avoids this andprotects your data by electing neither node as master until that node can besure that it has the latest cluster state and that there is no other master inthe cluster. This could result in the cluster having no master untilconnectivity is restored.You can solve this problem by adding a third node and making all three nodesmaster-eligible. A <<modules-discovery-quorums,master election>> requires onlytwo of the three master-eligible nodes. This means the cluster can tolerate theloss of any single node. This third node acts as a tiebreaker in cases where thetwo original nodes are disconnected from each other. You can reduce the resourcerequirements of this extra node by making it a <<voting-only-node,dedicatedvoting-only master-eligible node>>, also known as a dedicated tiebreaker.Because it has no other roles, a dedicated tiebreaker does not need to be aspowerful as the other two nodes. It will not perform any searches nor coordinateany client requests and cannot be elected as the master of the cluster.The two original nodes should not be voting-only master-eligible nodes since aresilient cluster requires at least three master-eligible nodes, at least twoof which are not voting-only master-eligible nodes. If two of your three nodesare voting-only master-eligible nodes then the elected master must be the thirdnode. This node then becomes a single point of failure.We recommend assigning both non-tiebreaker nodes all other roles. This createsredundancy by ensuring any task in the cluster can be handled by either node.You should not send any client requests to the dedicated tiebreaker node.You should also avoid sending client requests to just one of the other twonodes. If you do, and this node fails, then any requests will notreceive responses, even if the remaining nodes form a healthy cluster. Ideally,you should balance your client requests across both of the non-tiebreakernodes. You can do this by specifying the address of both nodeswhen configuring your client to connect to your cluster. Alternatively, you canuse a resilient load balancer to balance client requests across the appropriatenodes in your cluster. The {ess-trial}[Elastic Cloud] serviceprovides such a load balancer.A two-node cluster with an additional tiebreaker node is the smallest possiblecluster that is suitable for production deployments.[[high-availability-cluster-design-three-nodes]]==== Three-node clustersIf you have three nodes, we recommend they all be <<data-node,data nodes>> andevery index that is not a <<searchable-snapshots,searchable snapshot index>>should have at least one replica. Nodes are data nodes by default. You mayprefer for some indices to have two replicas so that each node has a copy ofeach shard in those indices. You should also configure each node to be<<master-node,master-eligible>> so that any two of them can hold a masterelection without needing to communicate with the third node. Nodes aremaster-eligible by default. This cluster will be resilient to the loss of anysingle node.You should avoid sending client requests to just one of your nodes. If you do,and this node fails, then any requests will not receive responses even if theremaining two nodes form a healthy cluster. Ideally, you should balance yourclient requests across all three nodes. You can do this by specifying theaddress of multiple nodes when configuring your client to connect to yourcluster. Alternatively you can use a resilient load balancer to balance clientrequests across your cluster. The {ess-trial}[Elastic Cloud]service provides such a load balancer.[[high-availability-cluster-design-three-plus-nodes]]==== Clusters with more than three nodesOnce your cluster grows to more than three nodes, you can start to specialisethese nodes according to their responsibilities, allowing you to scale theirresources independently as needed. You can have as many <<data-node,datanodes>>, <<ingest,ingest nodes>>, <<ml-node,{ml} nodes>>, etc. as needed tosupport your workload. As your cluster grows larger, we recommend usingdedicated nodes for each role. This lets you to independently scale resourcesfor each task.However, it is good practice to limit the number of master-eligible nodes inthe cluster to three. Master nodes do not scale like other node types sincethe cluster always elects just one of them as the master of the cluster. Ifthere are too many master-eligible nodes then master elections may take alonger time to complete. In larger clusters, we recommend youconfigure some of your nodes as dedicated master-eligible nodes and avoidsending any client requests to these dedicated nodes. Your cluster may becomeunstable if the master-eligible nodes are overwhelmed with unnecessary extrawork that could be handled by one of the other nodes.You may configure one of your master-eligible nodes to be a<<voting-only-node,voting-only node>> so that it can never be elected as themaster node. For instance, you may have two dedicated master nodes and a thirdnode that is both a data node and a voting-only master-eligible node. Thisthird voting-only node will act as a tiebreaker in master elections but willnever become the master itself.[[high-availability-cluster-design-small-cluster-summary]]==== SummaryThe cluster will be resilient to the loss of any node as long as:- The <<cluster-health,cluster health status>> is `green`.- There are at least two data nodes. - Every index that is not a <<searchable-snapshots,searchable snapshot index>>  has at least one replica of each shard, in addition to the primary.- The cluster has at least three master-eligible nodes, as long as at least two  of these nodes are not voting-only master-eligible nodes.- Clients are configured to send their requests to more than one node or are  configured to use a load balancer that balances the requests across an  appropriate set of nodes. The {ess-trial}[Elastic Cloud] service provides such  a load balancer.[[high-availability-cluster-design-large-clusters]]=== Resilience in larger clustersIt's not unusual for nodes to share common infrastructure, such as networkinterconnects or a power supply. If so, you should plan for the failure of thisinfrastructure and ensure that such a failure would not affect too many of yournodes. It is common practice to group all the nodes sharing some infrastructureinto _zones_ and to plan for the failure of any whole zone at once.{es} expects node-to-node connections to be reliable, have low latency, andhave adequate bandwidth. Many {es} tasks require multiple round-trips betweennodes. A slow or unreliable interconnect may have a significant effect on theperformance and stability of your cluster.For example, a few milliseconds of latency added to each round-trip can quicklyaccumulate into a noticeable performance penalty. An unreliable network mayhave frequent network partitions. {es} will automatically recover from anetwork partition as quickly as it can but your cluster may be partlyunavailable during a partition and will need to spend time and resources toresynchronize any missing data and rebalance itself once the partition heals.Recovering from a failure may involve copying a large amount of data betweennodes so the recovery time is often determined by the available bandwidth.If you've divided your cluster into zones, the network connections within eachzone are typically of higher quality than the connections between the zones.Ensure the network connections between zones are of sufficiently high quality.You will see the best results by locating all your zones within a single datacenter with each zone having its own independent power supply and othersupporting infrastructure. You can also _stretch_ your cluster across nearbydata centers as long as the network interconnection between each pair of datacenters is good enough.[[high-availability-cluster-design-min-network-perf]]There is no specific minimum network performance required to run a healthy {es}cluster. In theory, a cluster will work correctly even if the round-triplatency between nodes is several hundred milliseconds. In practice, if yournetwork is that slow then the cluster performance will be very poor. Inaddition, slow networks are often unreliable enough to cause network partitionsthat lead to periods of unavailability.If you want your data to be available in multiple data centers that are furtherapart or not well connected, deploy a separate cluster in each data center anduse <<modules-cross-cluster-search,{ccs}>> or <<xpack-ccr,{ccr}>> to link theclusters together. These features are designed to perform well even if thecluster-to-cluster connections are less reliable or performant than the networkwithin each cluster.After losing a whole zone's worth of nodes, a properly-designed cluster may befunctional but running with significantly reduced capacity. You may needto provision extra nodes to restore acceptable performance in yourcluster when handling such a failure.For resilience against whole-zone failures, it is important that there is a copyof each shard in more than one zone, which can be achieved by placing datanodes in multiple zones and configuring <<allocation-awareness,shard allocationawareness>>. You should also ensure that client requests are sent to nodes inmore than one zone.You should consider all node roles and ensure that each role is splitredundantly across two or more zones. For instance, if you are using<<ingest,ingest pipelines>> or {ml}, you should have ingest or {ml} nodes in twoor more zones. However, the placement of master-eligible nodes requires a littlemore care because a resilient cluster needs at least two of the threemaster-eligible nodes in order to function. The following sections explore theoptions for placing master-eligible nodes across multiple zones.[[high-availability-cluster-design-two-zones]]==== Two-zone clustersIf you have two zones, you should have a different number ofmaster-eligible nodes in each zone so that the zone with more nodes willcontain a majority of them and will be able to survive the loss of the otherzone. For instance, if you have three master-eligible nodes then you may putall of them in one zone or you may put two in one zone and the third in theother zone. You should not place an equal number of master-eligible nodes ineach zone. If you place the same number of master-eligible nodes in each zone,neither zone has a majority of its own. Therefore, the cluster may not survivethe loss of either zone.[[high-availability-cluster-design-two-zones-plus]]==== Two-zone clusters with a tiebreakerThe two-zone deployment described above is tolerant to the loss of one of itszones but not to the loss of the other one because master elections aremajority-based. You cannot configure a two-zone cluster so that it can toleratethe loss of _either_ zone because this is theoretically impossible. You mightexpect that if either zone fails then {es} can elect a node from the remainingzone as the master but it is impossible to tell the difference between thefailure of a remote zone and a mere loss of connectivity between the zones. Ifboth zones were capable of running independent elections then a loss ofconnectivity would lead to a{wikipedia}/Split-brain_(computing)[split-brain problem] andtherefore data loss. {es} avoids this and protects your data by not electinga node from either zone as master until that node can be sure that it has thelatest cluster state and that there is no other master in the cluster. This maymean there is no master at all until connectivity is restored.You can solve this by placing one master-eligible node in each of your twozones and adding a single extra master-eligible node in an independent thirdzone. The extra master-eligible node acts as a tiebreaker in caseswhere the two original zones are disconnected from each other. The extratiebreaker node should be a <<voting-only-node,dedicated voting-onlymaster-eligible node>>, also known as a dedicated tiebreaker. A dedicatedtiebreaker need not be as powerful as the other two nodes since it has no otherroles and will not perform any searches nor coordinate any client requests norbe elected as the master of the cluster.You should use <<allocation-awareness,shard allocation awareness>> to ensurethat there is a copy of each shard in each zone. This means either zone remainsfully available if the other zone fails.All master-eligible nodes, including voting-only nodes, are on the critical pathfor publishing cluster state updates. Because of this, these nodes requirereasonably fast persistent storage and a reliable, low-latency networkconnection to the rest of the cluster. If you add a tiebreaker node in a thirdindependent zone then you must make sure it has adequate resources and goodconnectivity to the rest of the cluster.[[high-availability-cluster-design-three-zones]]==== Clusters with three or more zonesIf you have three zones then you should have one master-eligible node in eachzone. If you have more than three zones then you should choose three of thezones and put a master-eligible node in each of these three zones. This willmean that the cluster can still elect a master even if one of the zones fails.As always, your indices should have at least one replica in case a node fails,unless they are <<searchable-snapshots,searchable snapshot indices>>. Youshould also use <<allocation-awareness,shard allocation awareness>> to limitthe number of copies of each shard in each zone. For instance, if you have anindex with one or two replicas configured then allocation awareness will ensurethat the replicas of the shard are in a different zone from the primary. Thismeans that a copy of every shard will still be available if one zone fails. Theavailability of this shard will not be affected by such a failure.[[high-availability-cluster-design-large-cluster-summary]]==== SummaryThe cluster will be resilient to the loss of any zone as long as:- The <<cluster-health,cluster health status>> is `green`.- There are at least two zones containing data nodes.- Every index that is not a <<searchable-snapshots,searchable snapshot index>>  has at least one replica of each shard, in addition to the primary.- Shard allocation awareness is configured to avoid concentrating all copies of  a shard within a single zone.- The cluster has at least three master-eligible nodes. At least two of these   nodes are not voting-only master-eligible nodes, and they are spread evenly  across at least three zones.- Clients are configured to send their requests to nodes in more than one zone  or are configured to use a load balancer that balances the requests across an  appropriate set of nodes. The {ess-trial}[Elastic Cloud] service provides such  a load balancer.
 |