12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485 |
- [[index-modules-translog]]
- == Translog
- Changes to a shard are only persisted to disk when the shard is ``flushed'',
- which is a relatively heavy operation and so cannot be performed after every
- index or delete operation. Instead, changes are accumulated in an in-memory
- indexing buffer and only written to disk periodically. This would mean that
- the contents of the in-memory buffer would be lost in the event of power
- failure or some other hardware crash.
- To prevent this data loss, each shard has a _transaction log_ or write ahead
- log associated with it. Any index or delete operation is first written to the
- translog before being processed by the internal Lucene index. This translog is
- only cleared once the shard has been flushed and the data in the in-memory
- buffer persisted to disk as a Lucene segment.
- In the event of a crash, recent transactions can be replayed from the
- transaction log when the shard recovers.
- [float]
- === Flush settings
- The following <<indices-update-settings,dynamically updatable>> settings
- control how often the in-memory buffer is flushed to disk:
- `index.translog.flush_threshold_size`::
- Once the translog hits this size, a flush will happen. Defaults to `512mb`.
- `index.translog.flush_threshold_ops`::
- After how many operations to flush. Defaults to `unlimited`.
- `index.translog.flush_threshold_period`::
- How long to wait before triggering a flush regardless of translog size. Defaults to `30m`.
- `index.translog.interval`::
- How often to check if a flush is needed, randomized between the interval value
- and 2x the interval value. Defaults to `5s`.
- [float]
- === Translog settings
- The translog itself is only persisted to disk when it is ++fsync++ed. Until
- then, data recently written to the translog may only exist in the file system
- cache and could potentially be lost in the event of hardware failure.
- The following <<indices-update-settings,dynamically updatable>> settings
- control the behaviour of the transaction log:
- `index.translog.sync_interval`::
- How often the translog is ++fsync++ed to disk. Defaults to `5s`.
- `index.translog.fs.type`::
- Either a `buffered` translog (default) which buffers 64kB in memory before
- writing to disk, or a `simple` translog which writes every entry to disk
- immediately. Whichever is used, these writes are only ++fsync++ed according
- to the `sync_interval`.
- The `buffered` translog is written to disk when it reaches 64kB in size, or
- whenever an `fsync` is triggered by the `sync_interval`.
- .Why don't we `fsync` the translog after every write?
- ******************************************************
- The disk is the slowest part of any server. An `fsync` ensures that data in
- the file system buffer has been physically written to disk, but this
- persistence comes with a performance cost.
- However, the translog is not the only persistence mechanism in Elasticsearch.
- Any index or update request is first written to the primary shard, then
- forwarded in parallel to any replica shards. The primary waits for the action
- to be completed on the replicas before returning to success to the client.
- If the node holding the primary shard dies for some reason, its transaction
- log could be missing the last 5 seconds of data. However, that data should
- already be available on a replica shard on a different node. Of course, if
- the whole data centre loses power at the same time, then it is possible that
- you could lose the last 5 seconds (or `sync_interval`) of data.
- ******************************************************
|