5 years ago · d3cf8d3ab9
--- a/docs/reference/setup/sysconfig.asciidoc
+++ b/docs/reference/setup/sysconfig.asciidoc
@@ -14,6 +14,7 @@ The following settings *must* be considered before going to production:
 
				 * <<max-number-of-threads,Ensure sufficient threads>>
			
 
				 * <<networkaddress-cache-ttl,JVM DNS cache settings>>
			
 
				 * <<executable-jna-tmpdir,Temporary directory not mounted with `noexec`>>
			
 
				+* <<system-config-tcpretries,TCP retransmission timeout>>
			
 
				 
			
 
				 [[dev-vs-prod]]
			
 
				 [discrete]
			
@@ -43,3 +44,5 @@ include::sysconfig/threads.asciidoc[]
 
				 include::sysconfig/dns-cache.asciidoc[]
			
 
				 
			
 
				 include::sysconfig/executable-jna-tmpdir.asciidoc[]
			
 
				+
			
 
				+include::sysconfig/tcpretries.asciidoc[]
			
--- a/docs/reference/setup/sysconfig/tcpretries.asciidoc
+++ b/docs/reference/setup/sysconfig/tcpretries.asciidoc
@@ -0,0 +1,49 @@
 
				+[[system-config-tcpretries]]
			
 
				+=== TCP retransmission timeout
			
 
				+
			
 
				+Each pair of nodes in a cluster communicates via a number of TCP connections
			
 
				+which remain open until one of the nodes shuts down or communication between
			
 
				+the nodes is disrupted by a failure in the underlying infrastructure.
			
 
				+
			
 
				+TCP provides reliable communication over occasionally-unreliable networks by
			
 
				+hiding temporary network disruptions from the communicating applications. Your
			
 
				+operating system will retransmit any lost messages a number of times before
			
 
				+informing the sender of any problem. Most Linux distributions default to
			
 
				+retransmitting any lost packets 15 times. Retransmissions back off
			
 
				+exponentially, so these 15 retransmissions take over 900 seconds to complete.
			
 
				+This means it takes Linux many minutes to detect a network partition or a
			
 
				+failed node with this method. Windows defaults to just 5 retransmissions which
			
 
				+corresponds with a timeout of around 6 seconds.
			
 
				+
			
 
				+The Linux default allows for communication over networks that may experience
			
 
				+very long periods of packet loss, but this default is excessive for production
			
 
				+networks within a single data centre as is the case for most {es} clusters.
			
 
				+Highly-available clusters must be able to detect node failures quickly so that
			
 
				+they can react promptly by reallocating lost shards, rerouting searches and
			
 
				+perhaps electing a new master node. Linux users should therefore reduce the
			
 
				+maximum number of TCP retransmissions.
			
 
				+
			
 
				+You can decrease the maximum number of TCP retransmissions to `5` by running
			
 
				+the following command as `root`. Five retransmissions corresponds with a
			
 
				+timeout of around 6 seconds.
			
 
				+
			
 
				+[source,sh]
			
 
				+-------------------------------------
			
 
				+sysctl -w net.ipv4.tcp_retries2=5
			
 
				+-------------------------------------
			
 
				+
			
 
				+To set this value permanently, update the `net.ipv4.tcp_retries2` setting in
			
 
				+`/etc/sysctl.conf`.  To verify after rebooting, run `sysctl
			
 
				+net.ipv4.tcp_retries2`.
			
 
				+
			
 
				+{es} also implements its own health checks with timeouts that are much shorter
			
 
				+than the default retransmission timeout on Linux. However these health checks
			
 
				+must allow for application-level effects such as garbage collection pauses. We
			
 
				+do not recommend reducing any timeouts related to these application-level
			
 
				+health checks.
			
 
				+
			
 
				+IMPORTANT: This setting applies to all TCP connections and will affect the
			
 
				+reliability of communication with systems outside your cluster too. If your
			
 
				+cluster communicates with external systems over an unreliable network then you
			
 
				+may need to select a higher value for `net.ipv4.tcp_retries2`. For this reason,
			
 
				+{es} does not adjust this setting automatically.