Browse Source

HybridDirectory should mmap postings. (#52641)

Since version 8.4, `MMapDirectory` has an optimization to read long[]
arrays directly in little endian order, which postings leverage. So it'd
be more efficient to open postings with `MMapDirectory`.

I refactored a bit the existing logic to better explain why every listed
file extension is open with `mmap`.
Adrien Grand 5 years ago
parent
commit
4943bc0cd3

+ 15 - 3
server/src/main/java/org/elasticsearch/index/store/FsDirectoryFactory.java

@@ -152,15 +152,27 @@ public class FsDirectoryFactory implements IndexStorePlugin.DirectoryFactory {
         boolean useDelegate(String name) {
             String extension = FileSwitchDirectory.getExtension(name);
             switch(extension) {
-                // We are mmapping norms, docvalues as well as term dictionaries, all other files are served through NIOFS
-                // this provides good random access performance and does not lead to page cache thrashing.
+                // Norms, doc values and term dictionaries are typically performance-sensitive and hot in the page
+                // cache, so we use mmap, which provides better performance.
                 case "nvd":
                 case "dvd":
                 case "tim":
+                // We want to open the terms index and KD-tree index off-heap to save memory, but this only performs
+                // well if using mmap.
                 case "tip":
-                case "cfs":
                 case "dim":
+                // Compound files are tricky because they store all the information for the segment. Benchmarks
+                // suggested that not mapping them hurts performance.
+                case "cfs":
+                // MMapDirectory has special logic to read long[] arrays in little-endian order that helps speed
+                // up the decoding of postings. The same logic applies to positions (.pos) of offsets (.pay) but we
+                // are not mmaping them as queries that leverage positions are more costly and the decoding of postings
+                // tends to be less a bottleneck.
+                case "doc":
                     return true;
+                // Other files are either less performance-sensitive (e.g. stored field index, norms metadata)
+                // or are large and have a random access pattern and mmap leads to page cache trashing
+                // (e.g. stored fields and term vectors).
                 default:
                     return false;
             }