Browse Source

[DOCS] Address local vs. remote storage + shard limits feedback (#109360)

shainaraskas 1 year ago
parent
commit
900eb82c99

+ 3 - 0
docs/reference/datatiers.asciidoc

@@ -22,6 +22,9 @@ mounted indices>> of <<ilm-searchable-snapshot,{search-snaps}>> exclusively.
 This extends the storage capacity even further — by up to 20 times compared to
 This extends the storage capacity even further — by up to 20 times compared to
 the warm tier. 
 the warm tier. 
 
 
+TIP: The performance of an {es} node is often limited by the performance of the underlying storage. 
+Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and <<search-use-faster-hardware,search>>.
+
 IMPORTANT: {es} generally expects nodes within a data tier to share the same 
 IMPORTANT: {es} generally expects nodes within a data tier to share the same 
 hardware profile. Variations not following this recommendation should be 
 hardware profile. Variations not following this recommendation should be 
 carefully architected to avoid <<hotspotting,hot spotting>>.
 carefully architected to avoid <<hotspotting,hot spotting>>.

+ 5 - 7
docs/reference/how-to/indexing-speed.asciidoc

@@ -94,6 +94,7 @@ auto-generated ids, Elasticsearch can skip this check, which makes indexing
 faster.
 faster.
 
 
 [discrete]
 [discrete]
+[[indexing-use-faster-hardware]]
 === Use faster hardware
 === Use faster hardware
 
 
 If indexing is I/O-bound, consider increasing the size of the filesystem cache
 If indexing is I/O-bound, consider increasing the size of the filesystem cache
@@ -110,13 +111,10 @@ different nodes so there's redundancy for any node failures. You can also use
 <<snapshot-restore,snapshot and restore>> to backup the index for further
 <<snapshot-restore,snapshot and restore>> to backup the index for further
 insurance.
 insurance.
 
 
-Directly-attached (local) storage generally performs better than remote storage
-because it is simpler to configure well and avoids communications overheads.
-With careful tuning it is sometimes possible to achieve acceptable performance
-using remote storage too. Benchmark your system with a realistic workload to
-determine the effects of any tuning parameters. If you cannot achieve the
-performance you expect, work with the vendor of your storage system to identify
-the problem.
+[discrete]
+==== Local vs.remote storage
+
+include::./remote-storage.asciidoc[]
 
 
 [discrete]
 [discrete]
 === Indexing buffer size
 === Indexing buffer size

+ 11 - 0
docs/reference/how-to/remote-storage.asciidoc

@@ -0,0 +1,11 @@
+Directly-attached (local) storage generally performs 
+better than remote storage because it is simpler to configure well and avoids 
+communications overheads.
+
+Some remote storage performs very poorly, especially 
+under the kind of load that {es} imposes. However, with careful tuning, it is 
+sometimes possible to achieve acceptable performance using remote storage too. 
+Before committing to a particular storage architecture, benchmark your system 
+with a realistic workload to determine the effects of any tuning parameters. If 
+you cannot achieve the performance you expect, work with the vendor of your 
+storage system to identify the problem.

+ 6 - 8
docs/reference/how-to/search-speed.asciidoc

@@ -38,6 +38,7 @@ for `/dev/nvme0n1`, specify `blockdev --setra 256 /dev/nvme0n1`.
 // end::readahead[]
 // end::readahead[]
 
 
 [discrete]
 [discrete]
+[[search-use-faster-hardware]]
 === Use faster hardware
 === Use faster hardware
 
 
 If your searches are I/O-bound, consider increasing the size of the filesystem
 If your searches are I/O-bound, consider increasing the size of the filesystem
@@ -46,16 +47,13 @@ sequential and random reads across multiple files, and there may be many
 searches running concurrently on each shard, so SSD drives tend to perform
 searches running concurrently on each shard, so SSD drives tend to perform
 better than spinning disks.
 better than spinning disks.
 
 
-Directly-attached (local) storage generally performs better than remote storage
-because it is simpler to configure well and avoids communications overheads.
-With careful tuning it is sometimes possible to achieve acceptable performance
-using remote storage too. Benchmark your system with a realistic workload to
-determine the effects of any tuning parameters. If you cannot achieve the
-performance you expect, work with the vendor of your storage system to identify
-the problem.
-
 If your searches are CPU-bound, consider using a larger number of faster CPUs.
 If your searches are CPU-bound, consider using a larger number of faster CPUs.
 
 
+[discrete]
+==== Local vs. remote storage
+
+include::./remote-storage.asciidoc[]
+
 [discrete]
 [discrete]
 === Document modeling
 === Document modeling
 
 

+ 4 - 0
docs/reference/how-to/shard-limits.asciidoc

@@ -0,0 +1,4 @@
+<<cluster-shard-limit,Cluster shard limits>> prevent creation of more than
+1000 non-frozen shards per node, and 3000 frozen shards per dedicated frozen
+node. Make sure you have enough nodes of each type in your cluster to handle
+the number of shards you need.

+ 4 - 4
docs/reference/how-to/size-your-shards.asciidoc

@@ -34,6 +34,9 @@ cluster sizing video]. As you test different shard configurations, use {kib}'s
 {kibana-ref}/elasticsearch-metrics.html[{es} monitoring tools] to track your
 {kibana-ref}/elasticsearch-metrics.html[{es} monitoring tools] to track your
 cluster's stability and performance.
 cluster's stability and performance.
 
 
+The performance of an {es} node is often limited by the performance of the underlying storage. 
+Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and <<search-use-faster-hardware,search>>.
+
 The following sections provide some reminders and guidelines you should
 The following sections provide some reminders and guidelines you should
 consider when designing your sharding strategy. If your cluster is already
 consider when designing your sharding strategy. If your cluster is already
 oversharded, see <<reduce-cluster-shard-count>>.
 oversharded, see <<reduce-cluster-shard-count>>.
@@ -225,10 +228,7 @@ GET _cat/shards?v=true
 [[shard-count-per-node-recommendation]]
 [[shard-count-per-node-recommendation]]
 ==== Add enough nodes to stay within the cluster shard limits
 ==== Add enough nodes to stay within the cluster shard limits
 
 
-The <<cluster-shard-limit,cluster shard limits>> prevent creation of more than
-1000 non-frozen shards per node, and 3000 frozen shards per dedicated frozen
-node. Make sure you have enough nodes of each type in your cluster to handle
-the number of shards you need.
+include::./shard-limits.asciidoc[]
 
 
 [discrete]
 [discrete]
 [[field-count-recommendation]]
 [[field-count-recommendation]]

+ 7 - 7
docs/reference/modules/node.asciidoc

@@ -1,5 +1,5 @@
 [[modules-node]]
 [[modules-node]]
-=== Node
+=== Nodes
 
 
 Any time that you start an instance of {es}, you are starting a _node_. A
 Any time that you start an instance of {es}, you are starting a _node_. A
 collection of connected nodes is called a <<modules-cluster,cluster>>. If you
 collection of connected nodes is called a <<modules-cluster,cluster>>. If you
@@ -14,6 +14,10 @@ All nodes know about all the other nodes in the cluster and can forward client
 requests to the appropriate node.
 requests to the appropriate node.
 // end::modules-node-description-tag[]
 // end::modules-node-description-tag[]
 
 
+TIP: The performance of an {es} node is often limited by the performance of the underlying storage. 
+Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and 
+<<search-use-faster-hardware,search>>.
+
 [[node-roles]]
 [[node-roles]]
 ==== Node roles
 ==== Node roles
 
 
@@ -236,6 +240,8 @@ assign data nodes to specific tiers: `data_content`,`data_hot`, `data_warm`,
 
 
 If you want to include a node in all tiers, or if your cluster does not use multiple tiers, then you can use the generic `data` role.
 If you want to include a node in all tiers, or if your cluster does not use multiple tiers, then you can use the generic `data` role.
 
 
+include::../how-to/shard-limits.asciidoc[]
+
 WARNING: If you assign a node to a specific tier using a specialized data role, then you shouldn't also assign it the generic `data` role. The generic `data` role takes precedence over specialized data roles.
 WARNING: If you assign a node to a specific tier using a specialized data role, then you shouldn't also assign it the generic `data` role. The generic `data` role takes precedence over specialized data roles.
 
 
 [[generic-data-node]]
 [[generic-data-node]]
@@ -471,12 +477,6 @@ properly-configured remote block devices (e.g. a SAN) and remote filesystems
 storage. You can run multiple {es} nodes on the same filesystem, but each {es}
 storage. You can run multiple {es} nodes on the same filesystem, but each {es}
 node must have its own data path.
 node must have its own data path.
 
 
-The performance of an {es} cluster is often limited by the performance of the
-underlying storage, so you must ensure that your storage supports acceptable
-performance. Some remote storage performs very poorly, especially under the
-kind of load that {es} imposes, so make sure to benchmark your system carefully
-before committing to a particular storage architecture.
-
 TIP: When using the `.zip` or `.tar.gz` distributions, the `path.data` setting
 TIP: When using the `.zip` or `.tar.gz` distributions, the `path.data` setting
 should be configured to locate the data directory outside the {es} home
 should be configured to locate the data directory outside the {es} home
 directory, so that the home directory can be deleted without deleting your data!
 directory, so that the home directory can be deleted without deleting your data!