merge.asciidoc 4.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
  1. [[index-modules-merge]]
  2. == Merge
  3. experimental[All of the settings exposed in the `merge` module are expert only and may be removed in the future]
  4. A shard in elasticsearch is a Lucene index, and a Lucene index is broken
  5. down into segments. Segments are internal storage elements in the index
  6. where the index data is stored, and are immutable up to delete markers.
  7. Segments are, periodically, merged into larger segments to keep the
  8. index size at bay and expunge deletes.
  9. Merges segments of approximately equal size, subject to an allowed
  10. number of segments per tier. The merge policy is able to merge
  11. non-adjacent segments, and separates how many segments are merged at once from how many
  12. segments are allowed per tier. It also does not over-merge (i.e., cascade merges).
  13. [float]
  14. [[merge-settings]]
  15. === Merge policy settings
  16. All merge policy settings are _dynamic_ and can be updated on a live index.
  17. The merge policy has the following settings:
  18. `index.merge.policy.expunge_deletes_allowed`::
  19. When expungeDeletes is called, we only merge away a segment if its delete
  20. percentage is over this threshold. Default is `10`.
  21. `index.merge.policy.floor_segment`::
  22. Segments smaller than this are "rounded up" to this size, i.e. treated as
  23. equal (floor) size for merge selection. This is to prevent frequent
  24. flushing of tiny segments, thus preventing a long tail in the index. Default
  25. is `2mb`.
  26. `index.merge.policy.max_merge_at_once`::
  27. Maximum number of segments to be merged at a time during "normal" merging.
  28. Default is `10`.
  29. `index.merge.policy.max_merge_at_once_explicit`::
  30. Maximum number of segments to be merged at a time, during optimize or
  31. expungeDeletes. Default is `30`.
  32. `index.merge.policy.max_merged_segment`::
  33. Maximum sized segment to produce during normal merging (not explicit
  34. optimize). This setting is approximate: the estimate of the merged segment
  35. size is made by summing sizes of to-be-merged segments (compensating for
  36. percent deleted docs). Default is `5gb`.
  37. `index.merge.policy.segments_per_tier`::
  38. Sets the allowed number of segments per tier. Smaller values mean more
  39. merging but fewer segments. Default is `10`. Note, this value needs to be
  40. >= than the `max_merge_at_once` otherwise you'll force too many merges to
  41. occur.
  42. `index.merge.policy.reclaim_deletes_weight`::
  43. Controls how aggressively merges that reclaim more deletions are favored.
  44. Higher values favor selecting merges that reclaim deletions. A value of
  45. `0.0` means deletions don't impact merge selection. Defaults to `2.0`.
  46. For normal merging, the policy first computes a "budget" of how many
  47. segments are allowed to be in the index. If the index is over-budget,
  48. then the policy sorts segments by decreasing size (proportionally considering percent
  49. deletes), and then finds the least-cost merge. Merge cost is measured by
  50. a combination of the "skew" of the merge (size of largest seg divided by
  51. smallest seg), total merge size and pct deletes reclaimed, so that
  52. merges with lower skew, smaller size and those reclaiming more deletes,
  53. are favored.
  54. If a merge will produce a segment that's larger than
  55. `max_merged_segment` then the policy will merge fewer segments (down to
  56. 1 at once, if that one has deletions) to keep the segment size under
  57. budget.
  58. Note, this can mean that for large shards that holds many gigabytes of
  59. data, the default of `max_merged_segment` (`5gb`) can cause for many
  60. segments to be in an index, and causing searches to be slower. Use the
  61. indices segments API to see the segments that an index has, and
  62. possibly either increase the `max_merged_segment` or issue an optimize
  63. call for the index (try and aim to issue it on a low traffic time).
  64. [float]
  65. [[merge-scheduling]]
  66. === Merge scheduling
  67. The merge scheduler (ConcurrentMergeScheduler) controls the execution of
  68. merge operations once they are needed (according to the merge policy). Merges
  69. run in separate threads, and when the maximum number of threads is reached,
  70. further merges will wait until a merge thread becomes available.
  71. The merge scheduler supports the following _dynamic_ settings:
  72. `index.merge.scheduler.max_thread_count`::
  73. The maximum number of threads that may be merging at once. Defaults to
  74. `Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2))`
  75. which works well for a good solid-state-disk (SSD). If your index is on
  76. spinning platter drives instead, decrease this to 1.
  77. `index.merge.scheduler.auto_throttle`::
  78. If this is true (the default), then the merge scheduler will rate-limit IO
  79. (writes) for merges to an adaptive value depending on how many merges are
  80. requested over time. An application with a low indexing rate that
  81. unluckily suddenly requires a large merge will see that merge aggressively
  82. throttled, while an application doing heavy indexing will see the throttle
  83. move higher to allow merges to keep up with ongoing indexing.