Fundamentals 23 min read

Understanding the Linux CFS Scheduler: Architecture, Implementation, and Common Questions

This article explains the Linux Completely Fair Scheduler (CFS), covering its design goals, core concepts such as virtual runtime, weight, red‑black tree management, load‑balancing mechanisms, scheduling policies, and answers common questions about its operation and kernel code.

Deepin Linux
Deepin Linux
Deepin Linux
Understanding the Linux CFS Scheduler: Architecture, Implementation, and Common Questions

In modern operating systems the scheduler acts like a conductor, coordinating CPU resources among many concurrent tasks; the Linux kernel uses the Completely Fair Scheduler (CFS) as its default scheduler for normal and batch tasks since version 2.6.23.

CFS aims to provide each runnable task a fair share of CPU time by tracking a vruntime value (virtual runtime) that grows proportionally to the task’s actual execution time divided by its weight. The task with the smallest vruntime is chosen to run next, ensuring that heavily weighted (high‑priority) tasks advance more slowly in virtual time and thus receive more real CPU time.

The scheduler’s weight is derived from the task’s nice value (‑20 to 19); lower nice values give higher weight. Weight influences how quickly a task’s vruntime increases, balancing interactive, batch, and idle workloads.

To manage the runnable queue efficiently, CFS stores tasks in a red‑black tree ordered by vruntime . Insertion, deletion, and lookup are O(log N), and the leftmost node always holds the task with the minimal vruntime . Each CPU has its own runqueue and red‑black tree, and the kernel maintains a min_vruntime to track the smallest virtual time in the tree.

Load balancing is performed across CPUs using scheduling domains and groups. The scheduler periodically migrates tasks from overloaded CPUs to underloaded ones while respecting CPU affinity, keeping the system’s overall load evenly distributed.

CFS provides several scheduling policies:

SCHED_NORMAL (SCHED_OTHER) : default for interactive user‑space tasks; priority is determined by nice.

SCHED_BATCH : for non‑interactive, CPU‑intensive batch jobs; runs when the system is less busy.

SCHED_IDLE : lowest‑priority background work that only runs when the system is idle.

The article also includes a Q&A section that reviews the evolution from O(N) and O(1) schedulers to CFS, explains the relationship between nice, priority, and weight, describes how vruntime is calculated (\(vruntime = (delta\_exec * nice\_0\_weight) / weight\)), when it is updated, and the role of min_vruntime . It details how newly created or waking tasks have their vruntime adjusted to avoid starvation, and explains kernel tables such as prio_to_weight , prio_to_wmult , runnable_avg_yN_inv , and runnable_avg_yN_sum used for weight lookup and load‑average calculations.

Key kernel structures involved are:

struct sched_entity {
    ...
    u64 vruntime;   // virtual runtime
    ...
};

struct task_struct {
    ...
    const struct sched_class *sched_class; // points to &fair_sched_class for CFS
    struct sched_entity se;                // CFS scheduling parameters
    ...
};

struct cfs_rq {
    u64 min_vruntime;          // smallest vruntime in the runqueue
    struct rb_root_cached tasks_timeline; // red‑black tree of tasks
    ...
};

struct rq {
    struct cfs_rq cfs;        // per‑CPU CFS runqueue
    ...
};

Overall, the CFS scheduler replaces fixed time‑slice algorithms with a fair, weight‑based system that dynamically balances CPU time among diverse workloads, improving responsiveness and throughput on both single‑ and multi‑core Linux systems.

kernelschedulerLinuxOperating SystemCFSmultitasking
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.