|
@@ -32,16 +32,17 @@ The files that make up a Lucene index are written sequentially from start to
|
|
|
end and then never modified or overwritten. This access pattern means the
|
|
|
checksum computation is very simple and can happen on-the-fly as the file is
|
|
|
initially written, and also makes it very unlikely that an incorrect checksum
|
|
|
-is due to a userspace bug at the time the file was written. The routine that
|
|
|
-computes the checksum is straightforward, widely used, and very well-tested, so
|
|
|
-you can be very confident that a checksum mismatch really does indicate that
|
|
|
-the data read from disk is different from the data that {es} previously wrote.
|
|
|
+is due to a userspace bug at the time the file was written. The part of {es}
|
|
|
+that computes the checksum is straightforward, widely used, and very
|
|
|
+well-tested, so you can be very confident that a checksum mismatch really does
|
|
|
+indicate that the data read from disk is different from the data that {es}
|
|
|
+previously wrote.
|
|
|
|
|
|
The files that make up a Lucene index are written in full before they are used.
|
|
|
If a file is needed to recover an index after a restart then your storage
|
|
|
system previously confirmed to {es} that this file was durably synced to disk.
|
|
|
On Linux this means that the `fsync()` system call returned successfully. {es}
|
|
|
-sometimes detects that an index is corrupt because a file needed for recovery
|
|
|
+sometimes reports that an index is corrupt because a file needed for recovery
|
|
|
has been truncated or is missing its footer. This indicates that your storage
|
|
|
system acknowledges durable writes incorrectly.
|
|
|
|
|
@@ -64,6 +65,8 @@ work correctly.
|
|
|
- Faulty hardware, which may include the drive itself, the RAID controller,
|
|
|
your RAM or CPU.
|
|
|
|
|
|
+- Third-party software which modifies the files that {es} writes.
|
|
|
+
|
|
|
Data corruption typically doesn't result in other evidence of problems apart
|
|
|
from the checksum mismatch. Do not interpret this as an indication that your
|
|
|
storage subsystem is working correctly and therefore that {es} itself caused
|
|
@@ -76,12 +79,15 @@ using something other than {es} and look for data integrity errors. On Linux
|
|
|
the `fio` and `stress-ng` tools can both generate challenging I/O workloads and
|
|
|
verify the integrity of the data they write. Use version 0.12.01 or newer of
|
|
|
`stress-ng` since earlier versions do not have strong enough integrity checks.
|
|
|
-You can check that durable writes persist across power outages using a script
|
|
|
-such as https://gist.github.com/bradfitz/3172656[`diskchecker.pl`].
|
|
|
+Verify that durable writes persist across power outages using a script such as
|
|
|
+https://gist.github.com/bradfitz/3172656[`diskchecker.pl`]. Alternatively, use
|
|
|
+a tool such as `strace` to observe the sequence of syscalls that {es} makes
|
|
|
+when writing data and confirm that this sequence does not explain the reported
|
|
|
+corruption.
|
|
|
|
|
|
To narrow down the source of the corruptions, systematically change components
|
|
|
in your cluster's environment until the corruptions stop. The details will
|
|
|
-depend on the exact configuration of your hardware, but may include the
|
|
|
+depend on the exact configuration of your environment, but may include the
|
|
|
following:
|
|
|
|
|
|
- Try a different filesystem or a different kernel.
|
|
@@ -90,3 +96,6 @@ following:
|
|
|
model or manufacturer.
|
|
|
|
|
|
- Try different firmware versions for each hardware component.
|
|
|
+
|
|
|
+- Remove any third-party software which may modify the contents of the {es}
|
|
|
+ data path.
|