merge.asciidoc 8.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212
  1. [[index-modules-merge]]
  2. == Merge
  3. A shard in elasticsearch is a Lucene index, and a Lucene index is broken
  4. down into segments. Segments are internal storage elements in the index
  5. where the index data is stored, and are immutable up to delete markers.
  6. Segments are, periodically, merged into larger segments to keep the
  7. index size at bay and expunge deletes.
  8. The more segments one has in the Lucene index means slower searches and
  9. more memory used. Segment merging is used to reduce the number of segments,
  10. however merges can be expensive to perform, especially on low IO environments.
  11. Merges can be throttled using <<store-throttling,store level throttling>>.
  12. [float]
  13. [[policy]]
  14. === Policy
  15. The index merge policy module allows one to control which segments of a
  16. shard index are to be merged. There are several types of policies with
  17. the default set to `tiered`.
  18. [float]
  19. [[tiered]]
  20. ==== tiered
  21. Merges segments of approximately equal size, subject to an allowed
  22. number of segments per tier. This is similar to `log_bytes_size` merge
  23. policy, except this merge policy is able to merge non-adjacent segment,
  24. and separates how many segments are merged at once from how many
  25. segments are allowed per tier. This merge policy also does not
  26. over-merge (i.e., cascade merges).
  27. This policy has the following settings:
  28. `index.merge.policy.expunge_deletes_allowed`::
  29. When expungeDeletes is called, we only merge away a segment if its delete
  30. percentage is over this threshold. Default is `10`.
  31. `index.merge.policy.floor_segment`::
  32. Segments smaller than this are "rounded up" to this size, i.e. treated as
  33. equal (floor) size for merge selection. This is to prevent frequent
  34. flushing of tiny segments, thus preventing a long tail in the index. Default
  35. is `2mb`.
  36. `index.merge.policy.max_merge_at_once`::
  37. Maximum number of segments to be merged at a time during "normal" merging.
  38. Default is `10`.
  39. `index.merge.policy.max_merge_at_once_explicit`::
  40. Maximum number of segments to be merged at a time, during optimize or
  41. expungeDeletes. Default is `30`.
  42. `index.merge.policy.max_merged_segment`::
  43. Maximum sized segment to produce during normal merging (not explicit
  44. optimize). This setting is approximate: the estimate of the merged segment
  45. size is made by summing sizes of to-be-merged segments (compensating for
  46. percent deleted docs). Default is `5gb`.
  47. `index.merge.policy.segments_per_tier`::
  48. Sets the allowed number of segments per tier. Smaller values mean more
  49. merging but fewer segments. Default is `10`. Note, this value needs to be
  50. >= than the `max_merge_at_once` otherwise you'll force too many merges to
  51. occur.
  52. `index.reclaim_deletes_weight`::
  53. Controls how aggressively merges that reclaim more deletions are favored.
  54. Higher values favor selecting merges that reclaim deletions. A value of
  55. `0.0` means deletions don't impact merge selection. Defaults to `2.0`.
  56. `index.compound_format`::
  57. Should the index be stored in compound format or not. Defaults to `false`.
  58. See <<index-compound-format,`index.compound_format`>> in
  59. <<index-modules-settings>>.
  60. For normal merging, this policy first computes a "budget" of how many
  61. segments are allowed to be in the index. If the index is over-budget,
  62. then the policy sorts segments by decreasing size (proportionally considering percent
  63. deletes), and then finds the least-cost merge. Merge cost is measured by
  64. a combination of the "skew" of the merge (size of largest seg divided by
  65. smallest seg), total merge size and pct deletes reclaimed, so that
  66. merges with lower skew, smaller size and those reclaiming more deletes,
  67. are favored.
  68. If a merge will produce a segment that's larger than
  69. `max_merged_segment` then the policy will merge fewer segments (down to
  70. 1 at once, if that one has deletions) to keep the segment size under
  71. budget.
  72. Note, this can mean that for large shards that holds many gigabytes of
  73. data, the default of `max_merged_segment` (`5gb`) can cause for many
  74. segments to be in an index, and causing searches to be slower. Use the
  75. indices segments API to see the segments that an index has, and
  76. possibly either increase the `max_merged_segment` or issue an optimize
  77. call for the index (try and aim to issue it on a low traffic time).
  78. [float]
  79. [[log-byte-size]]
  80. ==== log_byte_size
  81. A merge policy that merges segments into levels of exponentially
  82. increasing *byte size*, where each level has fewer segments than the
  83. value of the merge factor. Whenever extra segments (beyond the merge
  84. factor upper bound) are encountered, all segments within the level are
  85. merged.
  86. This policy has the following settings:
  87. [cols="<,<",options="header",]
  88. |=======================================================================
  89. |Setting |Description
  90. |index.merge.policy.merge_factor |Determines how often segment indices
  91. are merged by index operation. With smaller values, less RAM is used
  92. while indexing, and searches on unoptimized indices are faster, but
  93. indexing speed is slower. With larger values, more RAM is used during
  94. indexing, and while searches on unoptimized indices are slower, indexing
  95. is faster. Thus larger values (greater than 10) are best for batch index
  96. creation, and smaller values (lower than 10) for indices that are
  97. interactively maintained. Defaults to `10`.
  98. |index.merge.policy.min_merge_size |A size setting type which sets the
  99. minimum size for the lowest level segments. Any segments below this size
  100. are considered to be on the same level (even if they vary drastically in
  101. size) and will be merged whenever there are mergeFactor of them. This
  102. effectively truncates the "long tail" of small segments that would
  103. otherwise be created into a single level. If you set this too large, it
  104. could greatly increase the merging cost during indexing (if you flush
  105. many small segments). Defaults to `1.6mb`
  106. |index.merge.policy.max_merge_size |A size setting type which sets the
  107. largest segment (measured by total byte size of the segment's files)
  108. that may be merged with other segments. Defaults to unbounded.
  109. |index.merge.policy.max_merge_docs |Determines the largest segment
  110. (measured by document count) that may be merged with other segments.
  111. Defaults to unbounded.
  112. |=======================================================================
  113. [float]
  114. [[log-doc]]
  115. ==== log_doc
  116. A merge policy that tries to merge segments into levels of exponentially
  117. increasing *document count*, where each level has fewer segments than
  118. the value of the merge factor. Whenever extra segments (beyond the merge
  119. factor upper bound) are encountered, all segments within the level are
  120. merged.
  121. [cols="<,<",options="header",]
  122. |=======================================================================
  123. |Setting |Description
  124. |index.merge.policy.merge_factor |Determines how often segment indices
  125. are merged by index operation. With smaller values, less RAM is used
  126. while indexing, and searches on unoptimized indices are faster, but
  127. indexing speed is slower. With larger values, more RAM is used during
  128. indexing, and while searches on unoptimized indices are slower, indexing
  129. is faster. Thus larger values (greater than 10) are best for batch index
  130. creation, and smaller values (lower than 10) for indices that are
  131. interactively maintained. Defaults to `10`.
  132. |index.merge.policy.min_merge_docs |Sets the minimum size for the lowest
  133. level segments. Any segments below this size are considered to be on the
  134. same level (even if they vary drastically in size) and will be merged
  135. whenever there are mergeFactor of them. This effectively truncates the
  136. "long tail" of small segments that would otherwise be created into a
  137. single level. If you set this too large, it could greatly increase the
  138. merging cost during indexing (if you flush many small segments).
  139. Defaults to `1000`.
  140. |index.merge.policy.max_merge_docs |Determines the largest segment
  141. (measured by document count) that may be merged with other segments.
  142. Defaults to unbounded.
  143. |=======================================================================
  144. [float]
  145. [[scheduling]]
  146. === Scheduling
  147. The merge schedule controls the execution of merge operations once they
  148. are needed (according to the merge policy). The following types are
  149. supported, with the default being the `ConcurrentMergeScheduler`.
  150. [float]
  151. ==== ConcurrentMergeScheduler
  152. A merge scheduler that runs merges using a separate thread. When the maximum
  153. number of threads is reached, further merges will wait until a merge thread
  154. becomes available.
  155. The scheduler supports the following settings:
  156. `index.merge.scheduler.max_thread_count`::
  157. The maximum number of threads to perform the merge operation. Defaults to
  158. `Math.max(1, Math.min(3, Runtime.getRuntime().availableProcessors() / 2))`.
  159. [float]
  160. ==== SerialMergeScheduler
  161. A merge scheduler that simply does each merge sequentially using the
  162. calling thread (blocking the operations that triggered the merge or the
  163. index operation).