Understanding Linux PageCache: How the OS Accelerates File Reads and Writes
PageCache, a kernel-managed memory cache that stores disk data in RAM, dramatically speeds up file operations by turning repeated reads and writes into pure memory accesses, and its dynamic sizing, read‑ahead, and LRU eviction are demonstrated through Linux experiments with large files.
Storage Media Performance Gap
Different storage devices in a computer have vastly different speeds; a mechanical hard drive can be hundreds of times slower than RAM for read/write operations. Directly reading data from disk therefore hurts responsiveness, so operating systems introduce the PageCache mechanism to bridge this gap.
PageCache works like CPU caches (L1/L2/L3) but is implemented in software at the OS level rather than hardware.
What Is PageCache?
PageCache consists of memory pages whose contents correspond to physical blocks on disk. When a process is not using memory for its own code or data, the free RAM can be allocated to PageCache, and its size grows or shrinks dynamically based on available memory.
Because it uses any idle memory, PageCache can expand to occupy all free RAM, and it can also shrink when the system needs to free memory.
File Read
When a process issues a read system call, the kernel first checks whether the requested data resides in PageCache. If it does, the kernel returns the data directly from memory – a cache hit. If not, a cache miss occurs, the kernel performs a disk I/O operation, reads the data, and fills the corresponding page in PageCache for future accesses.
Only the pages that are actually accessed are cached; for example, a four‑page file may have only the first page stored in PageCache if that page is the one frequently read.
File Write
When a process calls write , the kernel typically follows a write‑through policy: the data is first written to PageCache and marked dirty, while the actual disk write is deferred. This makes the write appear as a fast, pure‑memory operation, and the progress bar reflects the amount of data placed into PageCache rather than the real disk‑write progress.
Dirty pages are periodically flushed to disk by the kernel, synchronizing the in‑memory cache with the persistent storage.
Experimental Verification
On a Linux machine, we created a 1 GB file filled with random data:
dd if=/dev/urandom of=testfile bs=1M count=1024Then we cleared all PageCache:
sync && echo 1 > /proc/sys/vm/drop_cachesReading the file for the first time forces a disk I/O:
$ time cat testfile > /dev/null
real 0m6.176s
user 0m0.028s
sys 0m0.731sThe output shows that the read took about 6 seconds, dominated by disk latency. After this read, the file resides in PageCache. Reading it a second time yields:
$ time cat testfile > /dev/null
real 0m0.309s
user 0m0.011s
sys 0m0.298sThe second read completes in roughly 0.3 seconds, nearly 20× faster, because it is served entirely from memory.
Conclusion
PageCache provides a substantial performance boost for repeated file accesses and sequential reads of large files. It is essential in scenarios such as compiling code, reading configuration files, video editing, and processing database logs, where the same data is accessed multiple times.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.