Fundamentals 13 min read

Understanding Linux File I/O: From User Read Calls to Disk Operations

This article explains how a simple read of a single byte in user space triggers a complex Linux I/O stack involving the read system call, VFS, page cache, generic block layer, and I/O scheduler, and clarifies when actual disk I/O occurs and how many bytes are transferred.

Refining Core Development Skills

May 8, 2020

Understanding Linux File I/O: From User Read Calls to Disk Operations

At a job interview, a candidate argued that reading a configuration file for each request would cause an extra disk I/O and degrade performance; this sparked a deeper look into how Linux actually handles file reads.

We start with a minimal C program that opens a file and reads one byte:

int main() 
{ 
    char c; 
    int in;

    in = open("in.txt", O_RDONLY);
    read(in,&c,1);
    return 0; 
}

To answer two questions—whether a disk I/O occurs and how many bytes Linux really reads—we need to examine the Linux I/O stack.

1. Linux I/O Stack Overview

A simplified diagram of the Linux I/O stack (source: http://www.ilinuxkernel.com/files/Linux.IO.stack_v1.0.pdf) shows the layers involved from the user request down to the hardware.

The stack includes the I/O engine, VFS, page cache, generic block layer, and I/O scheduler.

2. I/O Engine

Read/write functions belong to the synchronous I/O engine; other engines include mmap, libaio, and posixaio. The sync engine ultimately invokes the VFS read system call.

3. VFS (Virtual File System)

VFS abstracts different file systems and provides a uniform API. Its core structures are superblock, inode, file, and dentry. Operations such as mkdir and rename are defined in inode_operations, while read and write are defined in file_operations:

struct inode_operations {
    ...
    int (*link) (struct dentry *,struct inode *,struct dentry *);
    int (*unlink) (struct inode *,struct dentry *);
    int (*mkdir) (struct inode *,struct dentry *,umode_t);
    int (*rmdir) (struct inode *,struct dentry *);
    int (*rename) (struct inode *, struct dentry *,
                  struct inode *, struct dentry *, unsigned int);
    ...
};

struct file_operations {
    ...
    ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
    ...
    int (*mmap) (struct file *, struct vm_area_struct *);
    int (*open) (struct inode *, struct file *);
    int (*flush) (struct file *, fl_owner_t id);
};

4. Page Cache

The page cache is a pure‑memory cache that speeds up disk access. If the requested block is already cached, no actual disk I/O occurs; otherwise a new page is allocated, a page‑fault interrupt is raised, and the block is read from disk into the cache.

5. File System Layer

File systems manage inode and block structures; a typical block size is 4 KB. Example structures for ext4 are shown below:

const struct file_operations ext4_file_operations = {
    .read_iter  = ext4_file_read_iter,
    .write_iter = ext4_file_write_iter,
    .mmap       = ext4_file_mmap,
    .open       = ext4_file_open,
    ...
};

const struct inode_operations ext4_file_inode_operations = {
    .setattr = ext4_setattr,
    .getattr = ext4_file_getattr,
    ...
};

6. Generic Block Layer

The generic block layer handles all block‑device I/O requests using the bio structure. A bio represents an I/O operation composed of one or more segments, each segment being a full page or a part of a page.

7. I/O Scheduler

After the block layer creates a request, the I/O scheduler orders requests (e.g., noop, deadline, cfq) to maximize throughput, often using an elevator‑like algorithm.

You can view supported schedulers with dmesg | grep -i scheduler.

8. Full Read Flow

The library read function enters the sys_read system call. sys_read calls VFS functions like vfs_read and generic_file_read.

If the page cache hits, data is returned immediately.

If not, the kernel allocates a new page, triggers a page‑fault, and sends a block I/O request to the generic block layer.

The block layer queues the request as a bio.

The I/O scheduler orders the request.

The driver issues a DMA read to the disk, filling the new page in the cache.

An interrupt notifies completion, and the byte is copied to user space.

The process is awakened.

When the page cache hits, no disk I/O occurs. When it misses, the smallest unit transferred is a sector (typically 512 bytes). Higher layers work with larger units: the block layer with segments (often a full 4 KB page), the page cache with pages (4 KB), and the file system with blocks (commonly 4 KB). Consequently, reading a single byte can cause the kernel to read several kilobytes from disk.

Additional caches (disk internal cache, RAID controller cache) may further hide physical disk activity, so a miss in the page cache does not always mean the spindle spins.

Understanding these mechanisms helps developers reason about performance and diagnose latency issues in production systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kernel Linux file system page cache vfs IO Stack

Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.