Fundamentals 18 min read

Demystifying Linux I/O: From VFS and Inodes to ZFS and Block Layer

This article explains how Linux handles I/O operations, covering the virtual file system, inode and dentry structures, superblock layout, ZFS features, disk types, the generic block layer, I/O scheduling strategies, and key performance metrics for storage.

Efficient Ops
Efficient Ops
Efficient Ops
Demystifying Linux I/O: From VFS and Inodes to ZFS and Block Layer

File System

What is a file system

File systems are mechanisms that organize and manage files on storage devices; different organization methods produce different file systems such as Ext4, XFS, ZFS, and NFS.

Application developers usually interact only with system calls like

open

,

read

,

write

, and

close

, without worrying about the underlying file system type, disk interface, or storage medium.

How the file system works (VFS)

Linux files

In Linux, everything is a file, including regular files, directories, block devices, sockets, and pipes.

<code>brw-r--r-- 1 root    root    1, 2 Apr 25 11:03 bnod // block device file
crw-r--r-- 1 root    root    1, 2 Apr 25 11:04 cnod // character device file
drwxr-xr-x 2 user    user    6 Apr 25 11:01 dir // directory
-rw-r--r-- 1 user    user    0 Apr 25 11:01 file // regular file
prw-r--r-- 1 root    root    0 Apr 25 11:04 pipeline // named pipe
srwxr-xr-x 1 root    root    0 Apr 25 11:06 socket.sock // socket file
lrwxrwxrwx 1 root    root    4 Apr 25 11:04 softlink -> file // symbolic link
-rw-r--r-- 2 user    user    0 Apr 25 11:07 hardlink // hard link (also a regular file)</code>

inode (index node): stores metadata such as inode number, size, permissions, timestamps, and data location.

dentry (directory entry): stores the file name, inode pointer, and directory hierarchy.

inode and dentry

Inode records a file's metadata; it is persisted on disk and occupies space.

<code>stat file
  File: file
  Size: 0   Blocks: 0   IO Block: 4096 regular empty file
  Device: fe21h/65057d   Inode: 32828   Links: 2
  Access: (0644/-rw-r--r--)  Uid: ( 3041/ user)   Gid: ( 3041/ user)
  Access: 2021-04-25 11:07:59.603745534 +0800
  Modify: 2021-04-25 11:07:59.603745534 +0800
  Change: 2021-04-25 11:08:04.739848692 +0800
  Birth: -</code>

Dentry keeps the file name, the inode pointer, and the relationship to other dentries, forming the directory tree. Dentry is maintained in memory (dentry cache).

<code>tree
.
├── dir
│   └── file_in_dir
├── file
└── hardlink</code>

ZFS

ZFS is a widely used file system; many database applications rely on it.

Typical ZFS hierarchy:

ZFS operations

Create zpool

<code>root@:~ # zpool create tank raidz /dev/ada1 /dev/ada2 /dev/ada3 raidz /dev/ada4 /dev/ada5 /dev/ada6
root@:~ # zpool list tank
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tank     11G   824K  11.0G        -         -     0%     0%  1.00x  ONLINE  -
root@:~ # zpool status tank
  pool: tank
 state: ONLINE
  scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0</code>

Created a zpool named

tank

using RAID‑Z (RAID5‑like) layout.

Create ZFS filesystem

<code>root@:~ # zfs create -o mountpoint=/mnt/srev tank/srev
root@:~ # df -h tank/srev
Filesystem    Size    Used   Avail Capacity  Mounted on
tank/srev     7.1G    117K    7.1G     0%    /mnt/srev</code>

Mounted the ZFS filesystem at

/mnt/srev

with size equal to the zpool.

Set ZFS quota

<code>root@:~ # zfs set quota=1G tank/srev
root@:~ # df -h tank/srev
Filesystem    Size    Used   Avail Capacity  Mounted on
tank/srev     1.0G    118K    1.0G     0%    /mnt/srev</code>

ZFS features

Pool storage : zpool can be expanded dynamically, and multiple filesystems share the same pool without pre‑allocation.

Transactional filesystem : writes are atomic (copy‑on‑write), preventing partial writes after power loss.

ARC cache : Adaptive Replacement Cache balances LRU and LFU based on workload, using four lists (LRU, LFU, LRU ghost, LFU ghost).

Disk Types

Storage media

HDD (mechanical hard drive)

SSD (solid‑state drive)

Interfaces

IDE

SCSI

SAS

SATA

Linux disk management

Disks appear as block devices with major/minor numbers; e.g.,

/dev/sda

has major number 8 indicating an sd‑type block device.

<code>ls -l /dev/sda*
brw-rw---- 1 root disk 8, 0 Apr 25 15:53 /dev/sda
brw-rw---- 1 root disk 8, 1 Apr 25 15:53 /dev/sda1
...</code>

Generic Block Layer

The Generic Block Layer abstracts heterogeneous block devices for the VFS and provides a unified framework for drivers and I/O scheduling.

I/O Scheduling

Classic single‑queue schedulers:

NOOP – simple FIFO with basic request merging.

CFQ – Completely Fair Queueing, gives each process a fair share.

Deadline – prioritises requests that approach their deadline.

Multi‑queue (blk‑mq) schedulers:

BFQ – Budget Fair Queueing, allocates bandwidth based on request size.

Kyber – maintains separate sync/async queues and limits outstanding requests.

mq‑deadline – multi‑queue version of Deadline.

Performance Metrics

Common I/O performance indicators:

Utilisation (ioutil) – percentage of time the disk spends handling I/O.

IOPS – number of I/O operations per second.

Throughput/Bandwidth – amount of data transferred per second (MB/s or GB/s).

Latency – time from issuing an I/O request to receiving a response.

Saturation – overall busy level of the disk, often inferred from queue length or latency.

Typical monitoring commands:

iostat -d -x

– shows per‑device I/O statistics.

pidstat -d

– shows I/O of individual processes.

iotop

– interactive view of processes sorted by I/O usage.

PerformanceI/OlinuxFile SystemVFSZFSBlock Layer
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.