avoid-oversharding.asciidoc 6.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
  1. [[avoid-oversharding]]
  2. == Avoid oversharding
  3. In some cases, reducing the number of shards in a cluster while maintaining the
  4. same amount of data leads to a more effective use of system resources
  5. (CPU, RAM, IO). In these situations, we consider the cluster _oversharded_.
  6. The number of shards where this inflection point occurs depends on a variety
  7. of factors, including:
  8. * available hardware
  9. * indexing load
  10. * data volume
  11. * the types of queries executed against the clusters
  12. * the rate of these queries being issued
  13. * the volume of data being queried
  14. Testing against production data with production queries on production hardware
  15. is the only way to calibrate optimal shard sizes. Shard sizes of tens of GB
  16. are commonly used, and this may be a useful starting point from which to
  17. experiment. {kib}'s {kibana-ref}/elasticsearch-metrics.html[{es} monitoring]
  18. provides a useful view of historical cluster performance when evaluating the
  19. impact of different shard sizes.
  20. [discrete]
  21. [[oversharding-inefficient]]
  22. === Why oversharding is inefficient
  23. Each segment has metadata that needs to be kept in heap memory. These include
  24. lists of fields, the number of documents, and terms dictionaries. As a shard
  25. grows in size, the size of its segments generally grow because smaller segments
  26. are <<index-modules-merge,merged>> into fewer, larger segments. This typically
  27. reduces the amount of heap required by a shard’s segment metadata for a given
  28. data volume. At a bare minimum shards should be at least larger than 1GB to
  29. make the most efficient use of memory.
  30. However, even though shards start to be more memory efficient at around 1GB,
  31. a cluster full of 1GB shards will likely still perform poorly. This is because
  32. having many small shards can also have a negative impact on search and
  33. indexing operations. Each query or indexing operation is executed in a single
  34. thread per shard of indices being queried or indexed to. The node receiving
  35. a request from a client becomes responsible for distributing that request to
  36. the appropriate shards as well as reducing the results from those individual
  37. shards into a single response. Even assuming that a cluster has sufficient
  38. <<modules-threadpool,search threadpool threads>> available to immediately
  39. process the requested action against all shards required by the request, the
  40. overhead associated with making network requests to the nodes holding those
  41. shards and with having to merge the results of results from many small shards
  42. can lead to increased latency. This in turn can lead to exhaustion of the
  43. threadpool and, as a result, decreased throughput.
  44. [discrete]
  45. [[reduce-shard-counts-increase-shard-size]]
  46. === How to reduce shard counts and increase shard size
  47. Try these methods to reduce oversharding.
  48. [discrete]
  49. [[reduce-shards-for-new-indices]]
  50. ==== Reduce the number of shards for new indices
  51. You can specify the `index.number_of_shards` setting for new indices created
  52. with the <<indices-create-index,create index API>> or as part of
  53. <<indices-templates,index templates>> for indices automatically created by
  54. <<index-lifecycle-management,{ilm} ({ilm-init})>>.
  55. You can override the `index.number_of_shards` when rolling over an index
  56. using the <<rollover-index-api-example,rollover index API>>.
  57. [discrete]
  58. [[create-larger-shards-by-increasing-rollover-thresholds]]
  59. ==== Create larger shards by increasing rollover thresholds
  60. You can roll over indices using the
  61. <<indices-rollover-index,rollover index API>> or by specifying the
  62. <<ilm-rollover-action,rollover action>> in an {ilm-init} policy. If using an
  63. {ilm-init} policy, increase the rollover condition thresholds (`max_age`,
  64. `max_docs`, `max_size`) to allow the indices to grow to a larger size
  65. before being rolled over, which creates larger shards.
  66. Take special note of any empty indices. These may be managed by an {ilm-init}
  67. policy that is rolling over the indices because the `max_age` threshold is met.
  68. In this case, you may need to adjust the policy to make use of the `max_docs`
  69. or `max_size` properties to prevent the creation of these empty indices. One
  70. example where this may happen is if one or more {beats} stop sending data. If
  71. the {ilm-init}-managed indices for those {beats} are configured to roll over
  72. daily, then new, empty indices will be generated each day. Empty indices can
  73. be identified using the <<cat-count,cat count API>>.
  74. [discrete]
  75. [[create-larger-shards-with-index-patterns]]
  76. ==== Create larger shards by using index patterns spanning longer time periods
  77. Creating indices covering longer time periods reduces index and shard counts
  78. while increasing index sizes. For example, instead of daily indices, you can
  79. create monthly, or even yearly indices.
  80. If creating indices using {ls}, the
  81. {logstash-ref}/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-index[index]
  82. property of the {es} output can be modified to a
  83. <<date-math-index-names,date math expression>> covering a longer time period.
  84. For example, use `logstash-%{+YYYY.MM}` instead of `logstash-%{+YYYY.MM.dd}`
  85. to create monthly, rather than daily, indices. {beats} also lets you change the
  86. date math expression defined in the `index` property of the {es} output, such
  87. as for {filebeat-ref}/elasticsearch-output.html#index-option-es[Filebeat].
  88. [discrete]
  89. [[shrink-existing-index-to-fewer-shards]]
  90. ==== Shrink an existing index to fewer shards
  91. You can use the <<indices-shrink-index,shrink index API>> to shrink an
  92. existing index down to fewer shards.
  93. <<index-lifecycle-management,{ilm}>> also has a
  94. <<ilm-shrink-action,shrink action>> available for indices in the warm phase.
  95. [discrete]
  96. [[reindex-an-existing-index-to-fewer-shards]]
  97. ==== Reindex an existing index to fewer shards
  98. You can use the <<docs-reindex,reindex API>> to reindex from an existing index
  99. to a new index with fewer shards. After the data has been reindexed, the
  100. oversharded index can be deleted.
  101. [discrete]
  102. [[reindex-indices-from-shorter-periods-into-longer-periods]]
  103. ==== Reindex indices from shorter periods into longer periods
  104. You can use the <<docs-reindex,reindex API>> to reindex multiple small indices
  105. covering shorter time periods into a larger index covering a longer time period.
  106. For example, daily indices from October with naming patterns such as
  107. `foo-2019.10.11` could be combined into a monthly `foo-2019.10` index,
  108. like this:
  109. [source,console]
  110. --------------------------------------------------
  111. POST /_reindex
  112. {
  113. "source": {
  114. "index": "foo-2019.10.*"
  115. },
  116. "dest": {
  117. "index": "foo-2019.10"
  118. }
  119. }
  120. --------------------------------------------------