| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215 | [[index-modules-merge]]== MergeA shard in elasticsearch is a Lucene index, and a Lucene index is brokendown into segments. Segments are internal storage elements in the indexwhere the index data is stored, and are immutable up to delete markers.Segments are, periodically, merged into larger segments to keep theindex size at bay and expunge deletes.The more segments one has in the Lucene index means slower searches andmore memory used. Segment merging is used to reduce the number of segments,however merges can be expensive to perform, especially on low IO environments.Merges can be throttled using <<store-throttling,store level throttling>>.[float][[policy]]=== PolicyThe index merge policy module allows one to control which segments of ashard index are to be merged. There are several types of policies withthe default set to `tiered`.[float][[tiered]]==== tieredMerges segments of approximately equal size, subject to an allowednumber of segments per tier. This is similar to `log_bytes_size` mergepolicy, except this merge policy is able to merge non-adjacent segment,and separates how many segments are merged at once from how manysegments are allowed per tier. This merge policy also does notover-merge (i.e., cascade merges).This policy has the following settings:`index.merge.policy.expunge_deletes_allowed`::	When expungeDeletes is called, we only merge away a segment if its delete	percentage is over this threshold. Default is `10`.`index.merge.policy.floor_segment`::	Segments smaller than this are "rounded up" to this size, i.e. treated as	equal (floor) size for merge selection. This is to prevent frequent	flushing of tiny segments from allowing a long tail in the index. Default	is `2mb`.`index.merge.policy.max_merge_at_once`::	Maximum number of segments to be merged at a time during "normal" merging.	Default is `10`.`index.merge.policy.max_merge_at_once_explicit`::	Maximum number of segments to be merged at a time, during optimize or	expungeDeletes. Default is `30`.`index.merge.policy.max_merged_segment`::	Maximum sized segment to produce during normal merging (not explicit	optimize). This setting is approximate: the estimate of the merged segment	size is made by summing sizes of to-be-merged segments (compensating for	percent deleted docs). Default is `5gb`.`index.merge.policy.segments_per_tier`::	Sets the allowed number of segments per tier. Smaller values mean more	merging but fewer segments. Default is `10`. Note, this value needs to be	>= then the `max_merge_at_once` otherwise you'll force too many merges to	occur.`index.reclaim_deletes_weight`::	Controls how aggressively merges that reclaim more deletions are favored.	Higher values favor selecting merges that reclaim deletions. A value of	`0.0` means deletions don't impact merge selection. Defaults to `2.0`.`index.compound_format`::	Should the index be stored in compound format or not. Defaults to `false`.	See <<index-compound-format,`index.compound_format`>> in	<<index-modules-settings>>.For normal merging, this policy first computes a "budget" of how manysegments are allowed by be in the index. If the index is over-budget,then the policy sorts segments by decreasing size (pro-rating by percentdeletes), and then finds the least-cost merge. Merge cost is measured bya combination of the "skew" of the merge (size of largest seg divided bysmallest seg), total merge size and pct deletes reclaimed, so thatmerges with lower skew, smaller size and those reclaiming more deletes,are favored.If a merge will produce a segment that's larger than`max_merged_segment` then the policy will merge fewer segments (down to1 at once, if that one has deletions) to keep the segment size underbudget.Note, this can mean that for large shards that holds many gigabytes ofdata, the default of `max_merged_segment` (`5gb`) can cause for manysegments to be in an index, and causing searches to be slower. Use theindices segments API to see the segments that an index have, andpossibly either increase the `max_merged_segment` or issue an optimizecall for the index (try and aim to issue it on a low traffic time).[float][[log-byte-size]]==== log_byte_sizeA merge policy that merges segments into levels of exponentiallyincreasing *byte size*, where each level has fewer segments than thevalue of the merge factor. Whenever extra segments (beyond the mergefactor upper bound) are encountered, all segments within the level aremerged.This policy has the following settings:[cols="<,<",options="header",]|=======================================================================|Setting |Description|index.merge.policy.merge_factor |Determines how often segment indicesare merged by index operation. With smaller values, less RAM is usedwhile indexing, and searches on unoptimized indices are faster, butindexing speed is slower. With larger values, more RAM is used duringindexing, and while searches on unoptimized indices are slower, indexingis faster. Thus larger values (greater than 10) are best for batch indexcreation, and smaller values (lower than 10) for indices that areinteractively maintained. Defaults to `10`.|index.merge.policy.min_merge_size |A size setting type which sets theminimum size for the lowest level segments. Any segments below this sizeare considered to be on the same level (even if they vary drastically insize) and will be merged whenever there are mergeFactor of them. Thiseffectively truncates the "long tail" of small segments that wouldotherwise be created into a single level. If you set this too large, itcould greatly increase the merging cost during indexing (if you flushmany small segments). Defaults to `1.6mb`|index.merge.policy.max_merge_size |A size setting type which sets thelargest segment (measured by total byte size of the segment's files)that may be merged with other segments. Defaults to unbounded.|index.merge.policy.max_merge_docs |Determines the largest segment(measured by document count) that may be merged with other segments.Defaults to unbounded.|=======================================================================[float][[log-doc]]==== log_docA merge policy that tries to merge segments into levels of exponentiallyincreasing *document count*, where each level has fewer segments thanthe value of the merge factor. Whenever extra segments (beyond the mergefactor upper bound) are encountered, all segments within the level aremerged.[cols="<,<",options="header",]|=======================================================================|Setting |Description|index.merge.policy.merge_factor |Determines how often segment indicesare merged by index operation. With smaller values, less RAM is usedwhile indexing, and searches on unoptimized indices are faster, butindexing speed is slower. With larger values, more RAM is used duringindexing, and while searches on unoptimized indices are slower, indexingis faster. Thus larger values (greater than 10) are best for batch indexcreation, and smaller values (lower than 10) for indices that areinteractively maintained. Defaults to `10`.|index.merge.policy.min_merge_docs |Sets the minimum size for the lowestlevel segments. Any segments below this size are considered to be on thesame level (even if they vary drastically in size) and will be mergedwhenever there are mergeFactor of them. This effectively truncates the"long tail" of small segments that would otherwise be created into asingle level. If you set this too large, it could greatly increase themerging cost during indexing (if you flush many small segments).Defaults to `1000`.|index.merge.policy.max_merge_docs |Determines the largest segment(measured by document count) that may be merged with other segments.Defaults to unbounded.|=======================================================================[float][[scheduling]]=== SchedulingThe merge schedule controls the execution of merge operations once theyare needed (according to the merge policy). The following types aresupported, with the default being the `ConcurrentMergeScheduler`.[float]==== ConcurrentMergeSchedulerA merge scheduler that runs merges using a separated thread, until themaximum number of threads at which when a merge is needed, the thread(s)that are updating the index will pause until one or more mergescompletes.The scheduler supports the following settings:[cols="<,<",options="header",]|=======================================================================|Setting |Description|index.merge.scheduler.max_thread_count |The maximum number of threadsto perform the merge operation. Defaults to`Math.max(1, Math.min(3, Runtime.getRuntime().availableProcessors() / 2))`.|=======================================================================[float]==== SerialMergeSchedulerA merge scheduler that simply does each merge sequentially using thecalling thread (blocking the operations that triggered the merge, theindex operation).
 |