123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536 |
- [[size-your-shards]]
- == Size your shards
- Each index in {es} is divided into one or more shards, each of which may be
- replicated across multiple nodes to protect against hardware failures. If you
- are using <<data-streams>> then each data stream is backed by a sequence of
- indices. There is a limit to the amount of data you can store on a single node
- so you can increase the capacity of your cluster by adding nodes and increasing
- the number of indices and shards to match. However, each index and shard has
- some overhead and if you divide your data across too many shards then the
- overhead can become overwhelming. A cluster with too many indices or shards is
- said to suffer from _oversharding_. An oversharded cluster will be less
- efficient at responding to searches and in extreme cases it may even become
- unstable.
- [discrete]
- [[create-a-sharding-strategy]]
- === Create a sharding strategy
- The best way to prevent oversharding and other shard-related issues is to
- create a sharding strategy. A sharding strategy helps you determine and
- maintain the optimal number of shards for your cluster while limiting the size
- of those shards.
- Unfortunately, there is no one-size-fits-all sharding strategy. A strategy that
- works in one environment may not scale in another. A good sharding strategy
- must account for your infrastructure, use case, and performance expectations.
- The best way to create a sharding strategy is to benchmark your production data
- on production hardware using the same queries and indexing loads you'd see in
- production. For our recommended methodology, watch the
- https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing[quantitative
- cluster sizing video]. As you test different shard configurations, use {kib}'s
- {kibana-ref}/elasticsearch-metrics.html[{es} monitoring tools] to track your
- cluster's stability and performance.
- The following sections provide some reminders and guidelines you should
- consider when designing your sharding strategy. If your cluster is already
- oversharded, see <<reduce-cluster-shard-count>>.
- [discrete]
- [[shard-sizing-considerations]]
- === Sizing considerations
- Keep the following things in mind when building your sharding strategy.
- [discrete]
- [[single-thread-per-shard]]
- ==== Searches run on a single thread per shard
- Most searches hit multiple shards. Each shard runs the search on a single
- CPU thread. While a shard can run multiple concurrent searches, searches across a
- large number of shards can deplete a node's <<modules-threadpool,search
- thread pool>>. This can result in low throughput and slow search speeds.
- [discrete]
- [[each-shard-has-overhead]]
- ==== Each index, shard, segment and field has overhead
- Every index and every shard requires some memory and CPU resources. In most
- cases, a small set of large shards uses fewer resources than many small shards.
- Segments play a big role in a shard's resource usage. Most shards contain
- several segments, which store its index data. {es} keeps some segment metadata
- in heap memory so it can be quickly retrieved for searches. As a shard grows,
- its segments are <<index-modules-merge,merged>> into fewer, larger segments.
- This decreases the number of segments, which means less metadata is kept in
- heap memory.
- Every mapped field also carries some overhead in terms of memory usage and disk
- space. By default {es} will automatically create a mapping for every field in
- every document it indexes, but you can switch off this behaviour to
- <<explicit-mapping,take control of your mappings>>.
- Moreover every segment requires a small amount of heap memory for each mapped
- field. This per-segment-per-field heap overhead includes a copy of the field
- name, encoded using ISO-8859-1 if applicable or UTF-16 otherwise. Usually this
- is not noticeable, but you may need to account for this overhead if your shards
- have high segment counts and the corresponding mappings contain high field
- counts and/or very long field names.
- [discrete]
- [[shard-auto-balance]]
- ==== {es} automatically balances shards within a data tier
- A cluster's nodes are grouped into <<data-tiers,data tiers>>. Within each tier,
- {es} attempts to spread an index's shards across as many nodes as possible. When
- you add a new node or a node fails, {es} automatically rebalances the index's
- shards across the tier's remaining nodes.
- [discrete]
- [[shard-size-best-practices]]
- === Best practices
- Where applicable, use the following best practices as starting points for your
- sharding strategy.
- [discrete]
- [[delete-indices-not-documents]]
- ==== Delete indices, not documents
- Deleted documents aren't immediately removed from {es}'s file system.
- Instead, {es} marks the document as deleted on each related shard. The marked
- document will continue to use resources until it's removed during a periodic
- <<index-modules-merge,segment merge>>.
- When possible, delete entire indices instead. {es} can immediately remove
- deleted indices directly from the file system and free up resources.
- [discrete]
- [[use-ds-ilm-for-time-series]]
- ==== Use data streams and {ilm-init} for time series data
- <<data-streams,Data streams>> let you store time series data across multiple,
- time-based backing indices. You can use <<index-lifecycle-management,{ilm}
- ({ilm-init})>> to automatically manage these backing indices.
- One advantage of this setup is
- <<getting-started-index-lifecycle-management,automatic rollover>>, which creates
- a new write index when the current one meets a defined `max_primary_shard_size`,
- `max_age`, `max_docs`, or `max_size` threshold. When an index is no longer
- needed, you can use {ilm-init} to automatically delete it and free up resources.
- {ilm-init} also makes it easy to change your sharding strategy over time:
- * *Want to decrease the shard count for new indices?* +
- Change the <<index-number-of-shards,`index.number_of_shards`>> setting in the
- data stream's <<data-streams-change-mappings-and-settings,matching index
- template>>.
- * *Want larger shards or fewer backing indices?* +
- Increase your {ilm-init} policy's <<ilm-rollover,rollover threshold>>.
- * *Need indices that span shorter intervals?* +
- Offset the increased shard count by deleting older indices sooner. You can do
- this by lowering the `min_age` threshold for your policy's
- <<ilm-index-lifecycle,delete phase>>.
- Every new backing index is an opportunity to further tune your strategy.
- [discrete]
- [[shard-size-recommendation]]
- ==== Aim for shard sizes between 10GB and 50GB
- Larger shards take longer to recover after a failure. When a node fails, {es}
- rebalances the node's shards across the data tier's remaining nodes. This
- recovery process typically involves copying the shard contents across the
- network, so a 100GB shard will take twice as long to recover than a 50GB shard.
- In contrast, small shards carry proportionally more overhead and are less
- efficient to search. Searching fifty 1GB shards will take substantially more
- resources than searching a single 50GB shard containing the same data.
- There are no hard limits on shard size, but experience shows that shards
- between 10GB and 50GB typically work well for logs and time series data. You
- may be able to use larger shards depending on your network and use case.
- Smaller shards may be appropriate for
- {enterprise-search-ref}/index.html[Enterprise Search] and similar use cases.
- If you use {ilm-init}, set the <<ilm-rollover,rollover action>>'s
- `max_primary_shard_size` threshold to `50gb` to avoid shards larger than 50GB.
- To see the current size of your shards, use the <<cat-shards,cat shards API>>.
- [source,console]
- ----
- GET _cat/shards?v=true&h=index,prirep,shard,store&s=prirep,store&bytes=gb
- ----
- // TEST[setup:my_index]
- The `pri.store.size` value shows the combined size of all primary shards for
- the index.
- [source,txt]
- ----
- index prirep shard store
- .ds-my-data-stream-2099.05.06-000001 p 0 50gb
- ...
- ----
- // TESTRESPONSE[non_json]
- // TESTRESPONSE[s/\.ds-my-data-stream-2099\.05\.06-000001/my-index-000001/]
- // TESTRESPONSE[s/50gb/.*/]
- [discrete]
- [[shard-count-recommendation]]
- ==== Master-eligible nodes should have at least 1GB of heap per 3000 indices
- The number of indices a master node can manage is proportional to its heap
- size. The exact amount of heap memory needed for each index depends on various
- factors such as the size of the mapping and the number of shards per index.
- As a general rule of thumb, you should have fewer than 3000 indices per GB of
- heap on master nodes. For example, if your cluster has dedicated master nodes
- with 4GB of heap each then you should have fewer than 12000 indices. If your
- master nodes are not dedicated master nodes then the same sizing guidance
- applies: you should reserve at least 1GB of heap on each master-eligible node
- for every 3000 indices in your cluster.
- Note that this rule defines the absolute maximum number of indices that a
- master node can manage, but does not guarantee the performance of searches or
- indexing involving this many indices. You must also ensure that your data nodes
- have adequate resources for your workload and that your overall sharding
- strategy meets all your performance requirements. See also
- <<single-thread-per-shard>> and <<each-shard-has-overhead>>.
- To check the configured size of each node's heap, use the <<cat-nodes,cat nodes
- API>>.
- [source,console]
- ----
- GET _cat/nodes?v=true&h=heap.max
- ----
- // TEST[setup:my_index]
- You can use the <<cat-shards,cat shards API>> to check the number of shards per
- node.
- [source,console]
- ----
- GET _cat/shards?v=true
- ----
- // TEST[setup:my_index]
- [discrete]
- [[field-count-recommendation]]
- ==== Allow enough heap for field mappers and overheads
- Mapped fields consume some heap memory on each node, and require extra
- heap on data nodes.
- Ensure each node has enough heap for mappings, and also allow
- extra space for overheads associated with its workload. The following sections
- show how to determine these heap requirements.
- [discrete]
- ===== Mapping metadata in the cluster state
- Each node in the cluster has a copy of the <<cluster-state-api-desc,cluster state>>.
- The cluster state includes information about the field mappings for
- each index. This information has heap overhead. You can use the
- <<cluster-stats,Cluster stats API>> to get the heap overhead of the total size of
- all mappings after deduplication and compression.
- [source,console]
- ----
- GET _cluster/stats?human&filter_path=indices.mappings.total_deduplicated_mapping_size*
- ----
- // TEST[setup:node]
- This will show you information like in this example output:
- [source,console-result]
- ----
- {
- "indices": {
- "mappings": {
- "total_deduplicated_mapping_size": "1gb",
- "total_deduplicated_mapping_size_in_bytes": 1073741824
- }
- }
- }
- ----
- // TESTRESPONSE[s/"total_deduplicated_mapping_size": "1gb"/"total_deduplicated_mapping_size": $body.$_path/]
- // TESTRESPONSE[s/"total_deduplicated_mapping_size_in_bytes": 1073741824/"total_deduplicated_mapping_size_in_bytes": $body.$_path/]
- [discrete]
- ===== Retrieving heap size and field mapper overheads
- You can use the <<cluster-nodes-stats,Nodes stats API>> to get two relevant metrics
- for each node:
- * The size of the heap on each node.
- * Any additional estimated heap overhead for the fields per node. This is specific to
- data nodes, where apart from the cluster state field information mentioned above,
- there is additional heap overhead for each mapped field of an index held by the data
- node. For nodes which are not data nodes, this field may be zero.
- [source,console]
- ----
- GET _nodes/stats?human&filter_path=nodes.*.name,nodes.*.indices.mappings.total_estimated_overhead*,nodes.*.jvm.mem.heap_max*
- ----
- // TEST[setup:node]
- For each node, this will show you information like in this example output:
- [source,console-result]
- ----
- {
- "nodes": {
- "USpTGYaBSIKbgSUJR2Z9lg": {
- "name": "node-0",
- "indices": {
- "mappings": {
- "total_estimated_overhead": "1gb",
- "total_estimated_overhead_in_bytes": 1073741824
- }
- },
- "jvm": {
- "mem": {
- "heap_max": "4gb",
- "heap_max_in_bytes": 4294967296
- }
- }
- }
- }
- }
- ----
- // TESTRESPONSE[s/"USpTGYaBSIKbgSUJR2Z9lg"/\$node_name/]
- // TESTRESPONSE[s/"name": "node-0"/"name": $body.$_path/]
- // TESTRESPONSE[s/"total_estimated_overhead": "1gb"/"total_estimated_overhead": $body.$_path/]
- // TESTRESPONSE[s/"total_estimated_overhead_in_bytes": 1073741824/"total_estimated_overhead_in_bytes": $body.$_path/]
- // TESTRESPONSE[s/"heap_max": "4gb"/"heap_max": $body.$_path/]
- // TESTRESPONSE[s/"heap_max_in_bytes": 4294967296/"heap_max_in_bytes": $body.$_path/]
- [discrete]
- ===== Consider additional heap overheads
- Apart from the two field overhead metrics above, you must additionally allow
- enough heap for {es}'s baseline usage as well as your workload such as indexing,
- searches and aggregations. 0.5GB of extra heap will suffice for many reasonable
- workloads, and you may need even less if your workload is very light while heavy
- workloads may require more.
- [discrete]
- ===== Example
- As an example, consider the outputs above for a data node. The heap of the node
- will need at least:
- * 1 GB for the cluster state field information.
- * 1 GB for the additional estimated heap overhead for the fields of the data node.
- * 0.5 GB of extra heap for other overheads.
- Since the node has a 4GB heap max size in the example, it is thus sufficient
- for the total required heap of 2.5GB.
- If the heap max size for a node is not sufficient, consider
- <<avoid-unnecessary-fields,avoiding unnecessary fields>>,
- or scaling up the cluster, or redistributing index shards.
- Note that the above rules do not necessarily guarantee the performance of
- searches or indexing involving a very high number of indices. You must also
- ensure that your data nodes have adequate resources for your workload and
- that your overall sharding strategy meets all your performance requirements.
- See also <<single-thread-per-shard>> and <<each-shard-has-overhead>>.
- [discrete]
- [[avoid-node-hotspots]]
- ==== Avoid node hotspots
- If too many shards are allocated to a specific node, the node can become a
- hotspot. For example, if a single node contains too many shards for an index
- with a high indexing volume, the node is likely to have issues.
- To prevent hotspots, use the
- <<total-shards-per-node,`index.routing.allocation.total_shards_per_node`>> index
- setting to explicitly limit the number of shards on a single node. You can
- configure `index.routing.allocation.total_shards_per_node` using the
- <<indices-update-settings,update index settings API>>.
- [source,console]
- --------------------------------------------------
- PUT my-index-000001/_settings
- {
- "index" : {
- "routing.allocation.total_shards_per_node" : 5
- }
- }
- --------------------------------------------------
- // TEST[setup:my_index]
- [discrete]
- [[avoid-unnecessary-fields]]
- ==== Avoid unnecessary mapped fields
- By default {es} <<dynamic-mapping,automatically creates a mapping>> for every
- field in every document it indexes. Every mapped field corresponds to some data
- structures on disk which are needed for efficient search, retrieval, and
- aggregations on this field. Details about each mapped field are also held in
- memory. In many cases this overhead is unnecessary because a field is not used
- in any searches or aggregations. Use <<explicit-mapping>> instead of dynamic
- mapping to avoid creating fields that are never used. If a collection of fields
- are typically used together, consider using <<copy-to>> to consolidate them at
- index time. If a field is only rarely used, it may be better to make it a
- <<runtime,Runtime field>> instead.
- You can get information about which fields are being used with the
- <<field-usage-stats>> API, and you can analyze the disk usage of mapped fields
- using the <<indices-disk-usage>> API. Note however that unnecessary mapped
- fields also carry some memory overhead as well as their disk usage.
- [discrete]
- [[reduce-cluster-shard-count]]
- === Reduce a cluster's shard count
- If your cluster is already oversharded, you can use one or more of the following
- methods to reduce its shard count.
- [discrete]
- [[create-indices-that-cover-longer-time-periods]]
- ==== Create indices that cover longer time periods
- If you use {ilm-init} and your retention policy allows it, avoid using a
- `max_age` threshold for the rollover action. Instead, use
- `max_primary_shard_size` to avoid creating empty indices or many small shards.
- If your retention policy requires a `max_age` threshold, increase it to create
- indices that cover longer time intervals. For example, instead of creating daily
- indices, you can create indices on a weekly or monthly basis.
- [discrete]
- [[delete-empty-indices]]
- ==== Delete empty or unneeded indices
- If you're using {ilm-init} and roll over indices based on a `max_age` threshold,
- you can inadvertently create indices with no documents. These empty indices
- provide no benefit but still consume resources.
- You can find these empty indices using the <<cat-count,cat count API>>.
- [source,console]
- ----
- GET _cat/count/my-index-000001?v=true
- ----
- // TEST[setup:my_index]
- Once you have a list of empty indices, you can delete them using the
- <<indices-delete-index,delete index API>>. You can also delete any other
- unneeded indices.
- [source,console]
- ----
- DELETE my-index-000001
- ----
- // TEST[setup:my_index]
- [discrete]
- [[force-merge-during-off-peak-hours]]
- ==== Force merge during off-peak hours
- If you no longer write to an index, you can use the <<indices-forcemerge,force
- merge API>> to <<index-modules-merge,merge>> smaller segments into larger ones.
- This can reduce shard overhead and improve search speeds. However, force merges
- are resource-intensive. If possible, run the force merge during off-peak hours.
- [source,console]
- ----
- POST my-index-000001/_forcemerge
- ----
- // TEST[setup:my_index]
- [discrete]
- [[shrink-existing-index-to-fewer-shards]]
- ==== Shrink an existing index to fewer shards
- If you no longer write to an index, you can use the
- <<indices-shrink-index,shrink index API>> to reduce its shard count.
- {ilm-init} also has a <<ilm-shrink,shrink action>> for indices in the
- warm phase.
- [discrete]
- [[combine-smaller-indices]]
- ==== Combine smaller indices
- You can also use the <<docs-reindex,reindex API>> to combine indices
- with similar mappings into a single large index. For time series data, you could
- reindex indices for short time periods into a new index covering a
- longer period. For example, you could reindex daily indices from October with a
- shared index pattern, such as `my-index-2099.10.11`, into a monthly
- `my-index-2099.10` index. After the reindex, delete the smaller indices.
- [source,console]
- ----
- POST _reindex
- {
- "source": {
- "index": "my-index-2099.10.*"
- },
- "dest": {
- "index": "my-index-2099.10"
- }
- }
- ----
- [discrete]
- [[troubleshoot-shard-related-errors]]
- === Troubleshoot shard-related errors
- Here’s how to resolve common shard-related errors.
- [discrete]
- ==== this action would add [x] total shards, but this cluster currently has [y]/[z] maximum shards open;
- The <<cluster-max-shards-per-node,`cluster.max_shards_per_node`>> cluster
- setting limits the maximum number of open shards for a cluster. This error
- indicates an action would exceed this limit.
- If you're confident your changes won't destabilize the cluster, you can
- temporarily increase the limit using the <<cluster-update-settings,cluster
- update settings API>> and retry the action.
- [source,console]
- ----
- PUT _cluster/settings
- {
- "persistent" : {
- "cluster.max_shards_per_node": 1200
- }
- }
- ----
- This increase should only be temporary. As a long-term solution, we recommend
- you add nodes to the oversharded data tier or
- <<reduce-cluster-shard-count,reduce your cluster's shard count>>. To get a
- cluster's current shard count after making changes, use the
- <<cluster-stats,cluster stats API>>.
- [source,console]
- ----
- GET _cluster/stats?filter_path=indices.shards.total
- ----
- When a long-term solution is in place, we recommend you reset the
- `cluster.max_shards_per_node` limit.
- [source,console]
- ----
- PUT _cluster/settings
- {
- "persistent" : {
- "cluster.max_shards_per_node": null
- }
- }
- ----
|