Mastering Linux I/O Schedulers: When to Use CFQ, Deadline, or Noop
This article explains the Linux I/O scheduler layer, details the three main scheduling algorithms—CFQ, deadline, and noop—their internal mechanisms, tunable parameters, and provides guidance on selecting the appropriate scheduler for different storage workloads.
Linux I/O Scheduler Overview
The I/O scheduler resides in the Linux kernel’s I/O scheduling layer, which is one of seven layers in the overall I/O stack:
VFS layer – virtual file system abstraction.
File system layer – specific file‑system implementations.
Page cache layer – caching of pages.
Block layer – generic block‑device abstraction.
I/O scheduler layer – decides the order of block‑device requests.
Block device driver layer – hardware‑specific driver interface.
Block device layer – the physical device itself.
The scheduler’s goal is to improve overall block‑device performance, especially for mechanical disks where seek time dominates.
1. CFQ – Completely Fair Queueing
CFQ is the default scheduler for most general‑purpose workloads. It creates a separate queue for each process and allocates I/O time slices to ensure fairness.
CFQ also supports priority classes (RT, BE, IDLE) and per‑process I/O priority settings.
Key data structures:
cfq_data – global scheduler state.
cfq_group – represents a cgroup, stored in a red‑black tree keyed by
vdisktime(total I/O time used).
service_tree – seven trees per group handling RT, BE, and IDLE requests for read/write and async operations.
CFQ parameters (found under
/sys/class/block/<em>dev</em>/queue/iosched/) include:
back_seek_maxand
back_seek_penalty– control backward seeks.
fifo_expire_async/
fifo_expire_sync– timeout for async/sync requests.
slice_idleand
group_idle– idle wait times to improve sequential I/O.
low_latencyand
target_latency– enable low‑latency mode.
quantum,
slice_sync,
slice_async,
slice_async_rq– control request batch sizes and time slices.
CFQ also integrates with cgroup blkio control to enforce I/O quotas per group.
CFQ Design Highlights
Each process’s queue is placed in a red‑black tree ordered by its accumulated service time; the scheduler always selects the queue with the smallest
vdisktime, ensuring fairness across processes and cgroups.
2. Deadline Scheduler
Deadline is simpler and focuses on maximizing throughput while preventing request starvation.
It maintains four queues: read‑sort, write‑sort (ordered by sector), and read‑fifo, write‑fifo (ordered by request age).
When dequeuing, the scheduler prefers reads, checks for any request that has exceeded its deadline (
read_expireor
write_expire), and processes it to avoid starvation; otherwise it serves the next request in sector order.
Adjustable parameters include:
read_expire/
write_expire– request timeout in milliseconds.
fifo_batch– number of requests processed per batch.
writes_starved– threshold for when writes become eligible.
front_merges– enable/disable forward merges.
3. Noop Scheduler
Noop implements a simple FIFO queue with minimal merging; it is ideal for SSDs where complex scheduling provides no benefit.
Choosing the Right Scheduler
• CFQ : General‑purpose, desktop, and server workloads where fairness among processes is important.
• Deadline : Workloads with heavy I/O pressure on a few processes (e.g., databases) where throughput and bounded latency are critical.
• Noop : SSD or other devices with negligible seek time; the simplest scheduler yields the best performance.
Images illustrating the I/O stack and CFQ data structures:
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.