Understanding the Linux File I/O Stack: VFS, Filesystem, Block Layer, and SCSI
This article explains the Linux file I/O stack by outlining the path from user space through system calls to the kernel layers—VFS, filesystem, block layer, and SCSI—detailing each layer's role, page cache mechanisms, writeback processes, and direct I/O implementations with code examples.
Introduction: The article addresses the question "Describe the file I/O stack?" and outlines a three‑step approach: define a clear I/O stack route, explain the purpose of each node, and explore the kernel call chain.
Kernel I/O route
The I/O path from user space to the kernel follows: VFS → Filesystem → Block layer → SCSI layer . Linux treats everything as a file, so network I/O also passes through VFS.
1. VFS layer
The Virtual File System provides a generic interface that abstracts common structures (file, inode, dentry) and APIs, allowing different filesystems to implement these interfaces. It enables switching between filesystems without changing upper‑layer code.
2. Filesystem layer
This layer maps the abstract file concept to physical storage on a block device, deciding how data is laid out (e.g., 4 KB or 1 MB blocks) and using address_space interfaces for the mapping.
3. Block layer
The block layer abstracts hardware drivers, presenting block devices as linear spaces and implementing I/O scheduling strategies (e.g., elevator algorithm, CFQ, Deadline, NOOP) to aggregate and order requests.
4. SCSI layer
The SCSI layer acts as the final translator to the disk hardware, converting kernel I/O into device‑specific commands.
Page Cache and writeback
Page cache resides in the filesystem layer. Data can be flushed to disk via two methods: writeback (write + sync) using the page cache, or Direct I/O which bypasses the cache. The address_space_operations structure defines callbacks such as write_begin , write_end , writepage , readpage , and direct_IO .
struct address_space_operations {
// writeback (page cache)
int (*write_begin)(struct file *, struct address_space *mapping, loff_t pos,
unsigned len, unsigned flags, struct page **pagep, void **fsdata);
int (*write_end)(struct file *, struct address_space *mapping, loff_t pos,
unsigned len, unsigned copied, struct page *page, void *fsdata);
int (*writepage)(struct page *page, struct writeback_control *wbc);
int (*readpage)(struct file *, struct page *);
int (*writepages)(struct address_space *, struct writeback_control *);
int (*readpages)(struct file *filp, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages);
void (*readahead)(struct readahead_control *);
// direct I/O
ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
loff_t offset, unsigned long nr_segs);
};Example: The Minix filesystem implements a minimal set of these callbacks ( readpage , writepage , write_begin , write_end ) to support page‑cache writeback.
static const struct address_space_operations minix_apos = {
.readpage = minix_readpage,
.writepage = minix_writepage,
.write_begin = minix_write_begin,
.write_end = generic_write_end,
.bmap = minix_bmap,
};The write system call flow for a buffered write is:
SYSCALL_DEFINE3(write)
vfs_write
.write = do_sync_write // common path
generic_file_aio_write
generic_file_buffered_write
generic_perform_write // core functionDuring generic_perform_write the kernel allocates a page, copies user data into it, marks the page dirty, and later a kworker thread performs writeback using the filesystem’s writepage or writepages callbacks.
Direct I/O follows a similar syscall path but ends at generic_file_direct_write and invokes the filesystem’s direct_IO implementation, bypassing the page cache.
Summary
I/O stack: VFS → Filesystem → Block layer → SCSI driver.
VFS abstracts file operations and enables filesystem switching.
Filesystem maps logical files to physical blocks using address_space interfaces.
Block layer provides unified hardware abstraction and I/O scheduling.
SCSI layer translates kernel requests to disk hardware.
Buffered write requires implementing write_begin and write_page in the filesystem; direct I/O is optional.
Writeback is triggered by time, dirty‑page volume, or explicit sync, and is performed by kworker threads.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.