merge.asciidoc 8.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207
  1. [[index-modules-merge]]
  2. == Merge
  3. experimental[All of the settings exposed in the `merge` module are expert only and may be removed in the future]
  4. A shard in elasticsearch is a Lucene index, and a Lucene index is broken
  5. down into segments. Segments are internal storage elements in the index
  6. where the index data is stored, and are immutable up to delete markers.
  7. Segments are, periodically, merged into larger segments to keep the
  8. index size at bay and expunge deletes.
  9. [float]
  10. [[policy]]
  11. === Policy
  12. The index merge policy module allows one to control which segments of a
  13. shard index are to be merged. There are several types of policies with
  14. the default set to `tiered`.
  15. [float]
  16. [[tiered]]
  17. ==== tiered
  18. Merges segments of approximately equal size, subject to an allowed
  19. number of segments per tier. This is similar to `log_bytes_size` merge
  20. policy, except this merge policy is able to merge non-adjacent segment,
  21. and separates how many segments are merged at once from how many
  22. segments are allowed per tier. This merge policy also does not
  23. over-merge (i.e., cascade merges).
  24. This policy has the following settings:
  25. `index.merge.policy.expunge_deletes_allowed`::
  26. When expungeDeletes is called, we only merge away a segment if its delete
  27. percentage is over this threshold. Default is `10`.
  28. `index.merge.policy.floor_segment`::
  29. Segments smaller than this are "rounded up" to this size, i.e. treated as
  30. equal (floor) size for merge selection. This is to prevent frequent
  31. flushing of tiny segments, thus preventing a long tail in the index. Default
  32. is `2mb`.
  33. `index.merge.policy.max_merge_at_once`::
  34. Maximum number of segments to be merged at a time during "normal" merging.
  35. Default is `10`.
  36. `index.merge.policy.max_merge_at_once_explicit`::
  37. Maximum number of segments to be merged at a time, during optimize or
  38. expungeDeletes. Default is `30`.
  39. `index.merge.policy.max_merged_segment`::
  40. Maximum sized segment to produce during normal merging (not explicit
  41. optimize). This setting is approximate: the estimate of the merged segment
  42. size is made by summing sizes of to-be-merged segments (compensating for
  43. percent deleted docs). Default is `5gb`.
  44. `index.merge.policy.segments_per_tier`::
  45. Sets the allowed number of segments per tier. Smaller values mean more
  46. merging but fewer segments. Default is `10`. Note, this value needs to be
  47. >= than the `max_merge_at_once` otherwise you'll force too many merges to
  48. occur.
  49. `index.merge.policy.reclaim_deletes_weight`::
  50. Controls how aggressively merges that reclaim more deletions are favored.
  51. Higher values favor selecting merges that reclaim deletions. A value of
  52. `0.0` means deletions don't impact merge selection. Defaults to `2.0`.
  53. For normal merging, this policy first computes a "budget" of how many
  54. segments are allowed to be in the index. If the index is over-budget,
  55. then the policy sorts segments by decreasing size (proportionally considering percent
  56. deletes), and then finds the least-cost merge. Merge cost is measured by
  57. a combination of the "skew" of the merge (size of largest seg divided by
  58. smallest seg), total merge size and pct deletes reclaimed, so that
  59. merges with lower skew, smaller size and those reclaiming more deletes,
  60. are favored.
  61. If a merge will produce a segment that's larger than
  62. `max_merged_segment` then the policy will merge fewer segments (down to
  63. 1 at once, if that one has deletions) to keep the segment size under
  64. budget.
  65. Note, this can mean that for large shards that holds many gigabytes of
  66. data, the default of `max_merged_segment` (`5gb`) can cause for many
  67. segments to be in an index, and causing searches to be slower. Use the
  68. indices segments API to see the segments that an index has, and
  69. possibly either increase the `max_merged_segment` or issue an optimize
  70. call for the index (try and aim to issue it on a low traffic time).
  71. [float]
  72. [[log-byte-size]]
  73. ==== log_byte_size
  74. A merge policy that merges segments into levels of exponentially
  75. increasing *byte size*, where each level has fewer segments than the
  76. value of the merge factor. Whenever extra segments (beyond the merge
  77. factor upper bound) are encountered, all segments within the level are
  78. merged.
  79. This policy has the following settings:
  80. [cols="<,<",options="header",]
  81. |=======================================================================
  82. |Setting |Description
  83. |index.merge.policy.merge_factor |Determines how often segment indices
  84. are merged by index operation. With smaller values, less RAM is used
  85. while indexing, and searches on unoptimized indices are faster, but
  86. indexing speed is slower. With larger values, more RAM is used during
  87. indexing, and while searches on unoptimized indices are slower, indexing
  88. is faster. Thus larger values (greater than 10) are best for batch index
  89. creation, and smaller values (lower than 10) for indices that are
  90. interactively maintained. Defaults to `10`.
  91. |index.merge.policy.min_merge_size |A size setting type which sets the
  92. minimum size for the lowest level segments. Any segments below this size
  93. are considered to be on the same level (even if they vary drastically in
  94. size) and will be merged whenever there are mergeFactor of them. This
  95. effectively truncates the "long tail" of small segments that would
  96. otherwise be created into a single level. If you set this too large, it
  97. could greatly increase the merging cost during indexing (if you flush
  98. many small segments). Defaults to `1.6mb`
  99. |index.merge.policy.max_merge_size |A size setting type which sets the
  100. largest segment (measured by total byte size of the segment's files)
  101. that may be merged with other segments. Defaults to unbounded.
  102. |index.merge.policy.max_merge_docs |Determines the largest segment
  103. (measured by document count) that may be merged with other segments.
  104. Defaults to unbounded.
  105. |=======================================================================
  106. [float]
  107. [[log-doc]]
  108. ==== log_doc
  109. A merge policy that tries to merge segments into levels of exponentially
  110. increasing *document count*, where each level has fewer segments than
  111. the value of the merge factor. Whenever extra segments (beyond the merge
  112. factor upper bound) are encountered, all segments within the level are
  113. merged.
  114. [cols="<,<",options="header",]
  115. |=======================================================================
  116. |Setting |Description
  117. |index.merge.policy.merge_factor |Determines how often segment indices
  118. are merged by index operation. With smaller values, less RAM is used
  119. while indexing, and searches on unoptimized indices are faster, but
  120. indexing speed is slower. With larger values, more RAM is used during
  121. indexing, and while searches on unoptimized indices are slower, indexing
  122. is faster. Thus larger values (greater than 10) are best for batch index
  123. creation, and smaller values (lower than 10) for indices that are
  124. interactively maintained. Defaults to `10`.
  125. |index.merge.policy.min_merge_docs |Sets the minimum size for the lowest
  126. level segments. Any segments below this size are considered to be on the
  127. same level (even if they vary drastically in size) and will be merged
  128. whenever there are mergeFactor of them. This effectively truncates the
  129. "long tail" of small segments that would otherwise be created into a
  130. single level. If you set this too large, it could greatly increase the
  131. merging cost during indexing (if you flush many small segments).
  132. Defaults to `1000`.
  133. |index.merge.policy.max_merge_docs |Determines the largest segment
  134. (measured by document count) that may be merged with other segments.
  135. Defaults to unbounded.
  136. |=======================================================================
  137. [float]
  138. [[scheduling]]
  139. === Scheduling
  140. The merge scheduler (ConcurrentMergeScheduler) controls the execution of
  141. merge operations once they are needed (according to the merge policy). Merges
  142. run in separate threads, and when the maximum number of threads is reached,
  143. further merges will wait until a merge thread becomes available. The merge
  144. scheduler supports this setting:
  145. `index.merge.scheduler.max_thread_count`::
  146. The maximum number of threads that may be merging at once. Defaults to
  147. `Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))`
  148. which works well for a good solid-state-disk (SSD). If your index is on
  149. spinning platter drives instead, decrease this to 1.
  150. `index.merge.scheduler.auto_throttle`::
  151. If this is true (the default), then the merge scheduler will
  152. rate-limit IO (writes) for merges to an adaptive value depending on
  153. how many merges are requested over time. An application with a low
  154. indexing rate that unluckily suddenly requires a large merge will see
  155. that merge aggressively throttled, while an application doing heavy
  156. indexing will see the throttle move higher to allow merges to keep up
  157. with ongoing indexing. This is a dynamic setting (you can <<../indices/update-settings,change it
  158. at any time on a running index>>).
  159. [float]
  160. ==== SerialMergeScheduler
  161. This is accepted for backwards compatibility, but just uses
  162. ConcurrentMergeScheduler with index.merge.scheduler.max_thread_count
  163. set to 1 so that only 1 merge may run at a time.