8 years ago · 8c869e2a0b
--- a/docs/reference/how-to/disk-usage.asciidoc
+++ b/docs/reference/how-to/disk-usage.asciidoc
@@ -158,3 +158,24 @@ on disk usage. In particular, integers should be stored using an integer type
 
				 stored in a `scaled_float` if appropriate or in the smallest type that fits the
			
 
				 use-case: using `float` over `double`, or `half_float` over `float` will help
			
 
				 save storage.
			
 
				+
			
 
				+[float]
			
 
				+=== Use index sorting to colocate similar documents
			
 
				+
			
 
				+When Elasticsearch stores `_source`, it compresses multiple documents at once
			
 
				+in order to improve the overall compression ratio. For instance it is very
			
 
				+common that documents share the same field names, and quite common that they
			
 
				+share some field values, especially on fields that have a low cardinality or
			
 
				+a https://en.wikipedia.org/wiki/Zipf%27s_law[zipfian] distribution.
			
 
				+
			
 
				+By default documents are compressed together in the order that they are added
			
 
				+to the index. If you enabled <<index-modules-index-sorting,index sorting>>
			
 
				+then instead they are compressed in sorted order. Sorting documents with similar
			
 
				+structure, fields, and values together should improve the compression ratio.
			
 
				+
			
 
				+[float]
			
 
				+=== Put fields in the same order in documents
			
 
				+
			
 
				+Due to the fact that multiple documents are compressed together into blocks,
			
 
				+it is more likely to find longer duplicate strings in those `_source` documents
			
 
				+if fields always occur in the same order.
			
--- a/docs/reference/how-to/search-speed.asciidoc
+++ b/docs/reference/how-to/search-speed.asciidoc
@@ -326,3 +326,45 @@ queries, they should be mapped as a `keyword`.
 
				 <<index-modules-index-sorting,Index sorting>> can be useful in order to make
			
 
				 conjunctions faster at the cost of slightly slower indexing. Read more about it
			
 
				 in the <<index-modules-index-sorting-conjunctions,index sorting documentation>>.
			
 
				+
			
 
				+[float]
			
 
				+=== Use `preference` to optimize cache utilization
			
 
				+
			
 
				+There are multiple caches that can help with search performance, such as the
			
 
				+https://en.wikipedia.org/wiki/Page_cache[filesystem cache], the
			
 
				+<<shard-request-cache,request cache>> or the <<query-cache,query cache>>. Yet
			
 
				+all these caches are maintained at the node level, meaning that if you run the
			
 
				+same request twice in a row, have 1 <<glossary-replica-shard,replica>> or more
			
 
				+and use https://en.wikipedia.org/wiki/Round-robin_DNS[round-robin], the default
			
 
				+routing algorithm, then those two requests will go to different shard copies,
			
 
				+preventing node-level caches from helping.
			
 
				+
			
 
				+Since it is common for users of a search application to run similar requests
			
 
				+one after another, for instance in order to analyze a narrower subset of the
			
 
				+index, using a preference value that identifies the current user or session
			
 
				+could help optimize usage of the caches.
			
 
				+
			
 
				+[float]
			
 
				+=== Replicas might help with throughput, but not always
			
 
				+
			
 
				+In addition to improving resiliency, replicas can help improve throughput. For
			
 
				+instance if you have a single-shard index and three nodes, you will need to
			
 
				+set the number of replicas to 2 in order to have 3 copies of your shard in
			
 
				+total so that all nodes are utilized.
			
 
				+
			
 
				+Now imagine that you have a 2-shards index and two nodes. In one case, the
			
 
				+number of replicas is 0, meaning that each node holds a single shard. In the
			
 
				+second case the number of replicas is 1, meaning that each node has two shards.
			
 
				+Which setup is going to perform best in terms of search performance? Usually,
			
 
				+the setup that has fewer shards per node in total will perform better. The
			
 
				+reason for that is that it gives a greater share of the available filesystem
			
 
				+cache to each shard, and the filesystem cache is probably Elasticsearch's
			
 
				+number 1 performance factor. At the same time, beware that a setup that does
			
 
				+not have replicas is subject to failure in case of a single node failure, so
			
 
				+there is a trade-off between throughput and availability.
			
 
				+
			
 
				+So what is the right number of replicas? If you have a cluster that has
			
 
				+`num_nodes` nodes, `num_primaries` primary shards _in total_ and if you want to
			
 
				+be able to cope with `max_failures` node failures at once at most, then the
			
 
				+right number of replicas for you is
			
 
				+`max(max_failures, ceil(num_nodes / num_primaries) - 1)`.