merge.asciidoc 9.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208
  1. [[index-modules-merge]]
  2. == Merge
  3. A shard in elasticsearch is a Lucene index, and a Lucene index is broken
  4. down into segments. Segments are internal storage elements in the index
  5. where the index data is stored, and are immutable up to delete markers.
  6. Segments are, periodically, merged into larger segments to keep the
  7. index size at bay and expunge deletes.
  8. The more segments one has in the Lucene index mean slower searches and
  9. more memory used, but, low number of segments means more merging that
  10. has to go on.
  11. Since merges can be expensive to perform, especially on low IO
  12. environments, they can be throttled using store level throttling. Read
  13. the store module documentation on how to set it.
  14. [float]
  15. [[policy]]
  16. === Policy
  17. The index merge policy module allows one to control which segments of a
  18. shard index are to be merged. There are several types of policies with
  19. the default set to `tiered`.
  20. [float]
  21. [[tiered]]
  22. ==== tiered
  23. Merges segments of approximately equal size, subject to an allowed
  24. number of segments per tier. This is similar to `log_bytes_size` merge
  25. policy, except this merge policy is able to merge non-adjacent segment,
  26. and separates how many segments are merged at once from how many
  27. segments are allowed per tier. This merge policy also does not
  28. over-merge (i.e., cascade merges).
  29. This policy has the following settings:
  30. [cols="<,<",options="header",]
  31. |=======================================================================
  32. |Setting |Description
  33. |`index.merge.policy.expunge_deletes_allowed` |When expungeDeletes is
  34. called, we only merge away a segment if its delete percentage is over
  35. this threshold. Default is `10`.
  36. |`index.merge.policy.floor_segment` |Segments smaller than this are
  37. "rounded up" to this size, i.e. treated as equal (floor) size for merge
  38. selection. This is to prevent frequent flushing of tiny segments from
  39. allowing a long tail in the index. Default is `2mb`.
  40. |`index.merge.policy.max_merge_at_once` |Maximum number of segments to
  41. be merged at a time during "normal" merging. Default is `10`.
  42. |`index.merge.policy.max_merge_at_once_explicit` |Maximum number of
  43. segments to be merged at a time, during optimize or expungeDeletes.
  44. Default is `30`.
  45. |`index.merge.policy.max_merged_segment` |Maximum sized segment to
  46. produce during normal merging (not explicit optimize). This setting is
  47. approximate: the estimate of the merged segment size is made by summing
  48. sizes of to-be-merged segments (compensating for percent deleted docs).
  49. Default is `5gb`.
  50. |`index.merge.policy.segments_per_tier` |Sets the allowed number of
  51. segments per tier. Smaller values mean more merging but fewer segments.
  52. Default is `10`. Note, this value needs to be >= then the
  53. `max_merge_at_once_` otherwise you'll force too many merges to occur.
  54. |`index.reclaim_deletes_weight` |Controls how aggressively merges that
  55. reclaim more deletions are favored. Higher values favor selecting merges
  56. that reclaim deletions. A value of `0.0` means deletions don't impact
  57. merge selection. Defaults to `2.0`.
  58. |`index.compound_format` |Should the index be stored in compound format
  59. or not. Defaults to `false`.
  60. |=======================================================================
  61. For normal merging, this policy first computes a "budget" of how many
  62. segments are allowed by be in the index. If the index is over-budget,
  63. then the policy sorts segments by decreasing size (pro-rating by percent
  64. deletes), and then finds the least-cost merge. Merge cost is measured by
  65. a combination of the "skew" of the merge (size of largest seg divided by
  66. smallest seg), total merge size and pct deletes reclaimed, so that
  67. merges with lower skew, smaller size and those reclaiming more deletes,
  68. are favored.
  69. If a merge will produce a segment that's larger than
  70. `max_merged_segment` then the policy will merge fewer segments (down to
  71. 1 at once, if that one has deletions) to keep the segment size under
  72. budget.
  73. Note, this can mean that for large shards that holds many gigabytes of
  74. data, the default of `max_merged_segment` (`5gb`) can cause for many
  75. segments to be in an index, and causing searches to be slower. Use the
  76. indices segments API to see the segments that an index have, and
  77. possibly either increase the `max_merged_segment` or issue an optimize
  78. call for the index (try and aim to issue it on a low traffic time).
  79. [float]
  80. [[log-byte-size]]
  81. ==== log_byte_size
  82. A merge policy that merges segments into levels of exponentially
  83. increasing *byte size*, where each level has fewer segments than the
  84. value of the merge factor. Whenever extra segments (beyond the merge
  85. factor upper bound) are encountered, all segments within the level are
  86. merged.
  87. This policy has the following settings:
  88. [cols="<,<",options="header",]
  89. |=======================================================================
  90. |Setting |Description
  91. |index.merge.policy.merge_factor |Determines how often segment indices
  92. are merged by index operation. With smaller values, less RAM is used
  93. while indexing, and searches on unoptimized indices are faster, but
  94. indexing speed is slower. With larger values, more RAM is used during
  95. indexing, and while searches on unoptimized indices are slower, indexing
  96. is faster. Thus larger values (greater than 10) are best for batch index
  97. creation, and smaller values (lower than 10) for indices that are
  98. interactively maintained. Defaults to `10`.
  99. |index.merge.policy.min_merge_size |A size setting type which sets the
  100. minimum size for the lowest level segments. Any segments below this size
  101. are considered to be on the same level (even if they vary drastically in
  102. size) and will be merged whenever there are mergeFactor of them. This
  103. effectively truncates the "long tail" of small segments that would
  104. otherwise be created into a single level. If you set this too large, it
  105. could greatly increase the merging cost during indexing (if you flush
  106. many small segments). Defaults to `1.6mb`
  107. |index.merge.policy.max_merge_size |A size setting type which sets the
  108. largest segment (measured by total byte size of the segment's files)
  109. that may be merged with other segments. Defaults to unbounded.
  110. |index.merge.policy.max_merge_docs |Determines the largest segment
  111. (measured by document count) that may be merged with other segments.
  112. Defaults to unbounded.
  113. |=======================================================================
  114. [float]
  115. [[log-doc]]
  116. ==== log_doc
  117. A merge policy that tries to merge segments into levels of exponentially
  118. increasing *document count*, where each level has fewer segments than
  119. the value of the merge factor. Whenever extra segments (beyond the merge
  120. factor upper bound) are encountered, all segments within the level are
  121. merged.
  122. [cols="<,<",options="header",]
  123. |=======================================================================
  124. |Setting |Description
  125. |index.merge.policy.merge_factor |Determines how often segment indices
  126. are merged by index operation. With smaller values, less RAM is used
  127. while indexing, and searches on unoptimized indices are faster, but
  128. indexing speed is slower. With larger values, more RAM is used during
  129. indexing, and while searches on unoptimized indices are slower, indexing
  130. is faster. Thus larger values (greater than 10) are best for batch index
  131. creation, and smaller values (lower than 10) for indices that are
  132. interactively maintained. Defaults to `10`.
  133. |index.merge.policy.min_merge_docs |Sets the minimum size for the lowest
  134. level segments. Any segments below this size are considered to be on the
  135. same level (even if they vary drastically in size) and will be merged
  136. whenever there are mergeFactor of them. This effectively truncates the
  137. "long tail" of small segments that would otherwise be created into a
  138. single level. If you set this too large, it could greatly increase the
  139. merging cost during indexing (if you flush many small segments).
  140. Defaults to `1000`.
  141. |index.merge.policy.max_merge_docs |Determines the largest segment
  142. (measured by document count) that may be merged with other segments.
  143. Defaults to unbounded.
  144. |=======================================================================
  145. [float]
  146. [[scheduling]]
  147. === Scheduling
  148. The merge schedule controls the execution of merge operations once they
  149. are needed (according to the merge policy). The following types are
  150. supported, with the default being the `ConcurrentMergeScheduler`.
  151. [float]
  152. ==== ConcurrentMergeScheduler
  153. A merge scheduler that runs merges using a separated thread, until the
  154. maximum number of threads at which when a merge is needed, the thread(s)
  155. that are updating the index will pause until one or more merges
  156. completes.
  157. The scheduler supports the following settings:
  158. [cols="<,<",options="header",]
  159. |=======================================================================
  160. |Setting |Description
  161. |index.merge.scheduler.max_thread_count |The maximum number of threads
  162. to perform the merge operation. Defaults to
  163. `Math.max(1, Math.min(3, Runtime.getRuntime().availableProcessors() / 2))`.
  164. |=======================================================================
  165. [float]
  166. ==== SerialMergeScheduler
  167. A merge scheduler that simply does each merge sequentially using the
  168. calling thread (blocking the operations that triggered the merge, the
  169. index operation).