|
@@ -326,3 +326,45 @@ queries, they should be mapped as a `keyword`.
|
|
|
<<index-modules-index-sorting,Index sorting>> can be useful in order to make
|
|
|
conjunctions faster at the cost of slightly slower indexing. Read more about it
|
|
|
in the <<index-modules-index-sorting-conjunctions,index sorting documentation>>.
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Use `preference` to optimize cache utilization
|
|
|
+
|
|
|
+There are multiple caches that can help with search performance, such as the
|
|
|
+https://en.wikipedia.org/wiki/Page_cache[filesystem cache], the
|
|
|
+<<shard-request-cache,request cache>> or the <<query-cache,query cache>>. Yet
|
|
|
+all these caches are maintained at the node level, meaning that if you run the
|
|
|
+same request twice in a row, have 1 <<glossary-replica-shard,replica>> or more
|
|
|
+and use https://en.wikipedia.org/wiki/Round-robin_DNS[round-robin], the default
|
|
|
+routing algorithm, then those two requests will go to different shard copies,
|
|
|
+preventing node-level caches from helping.
|
|
|
+
|
|
|
+Since it is common for users of a search application to run similar requests
|
|
|
+one after another, for instance in order to analyze a narrower subset of the
|
|
|
+index, using a preference value that identifies the current user or session
|
|
|
+could help optimize usage of the caches.
|
|
|
+
|
|
|
+[float]
|
|
|
+=== Replicas might help with throughput, but not always
|
|
|
+
|
|
|
+In addition to improving resiliency, replicas can help improve throughput. For
|
|
|
+instance if you have a single-shard index and three nodes, you will need to
|
|
|
+set the number of replicas to 2 in order to have 3 copies of your shard in
|
|
|
+total so that all nodes are utilized.
|
|
|
+
|
|
|
+Now imagine that you have a 2-shards index and two nodes. In one case, the
|
|
|
+number of replicas is 0, meaning that each node holds a single shard. In the
|
|
|
+second case the number of replicas is 1, meaning that each node has two shards.
|
|
|
+Which setup is going to perform best in terms of search performance? Usually,
|
|
|
+the setup that has fewer shards per node in total will perform better. The
|
|
|
+reason for that is that it gives a greater share of the available filesystem
|
|
|
+cache to each shard, and the filesystem cache is probably Elasticsearch's
|
|
|
+number 1 performance factor. At the same time, beware that a setup that does
|
|
|
+not have replicas is subject to failure in case of a single node failure, so
|
|
|
+there is a trade-off between throughput and availability.
|
|
|
+
|
|
|
+So what is the right number of replicas? If you have a cluster that has
|
|
|
+`num_nodes` nodes, `num_primaries` primary shards _in total_ and if you want to
|
|
|
+be able to cope with `max_failures` node failures at once at most, then the
|
|
|
+right number of replicas for you is
|
|
|
+`max(max_failures, ceil(num_nodes / num_primaries) - 1)`.
|