Fundamentals 15 min read

Linux CPU Load Balancing: Framework and Implementation Analysis

Linux kernel load balancing, explained through the 5.4.28 framework, defines precise CPU load via PELT, balances tasks according to each core’s capacity using hierarchical scheduling domains and groups, details class‑specific strategies, illustrates migration in a big‑little 8‑core system, and outlines the three balancing scenarios and their trigger mechanisms.

OPPO Kernel Craftsman
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Linux CPU Load Balancing: Framework and Implementation Analysis

This article is the first in a three-part series on Linux kernel load balancing, focusing on the framework, principles, and architecture. The content is based on Linux kernel version 5.4.28.

1. What is CPU Load

CPU load is often confused with CPU utilization. CPU utilization measures the ratio of busy time to idle time in a measurement window, while CPU load represents the actual pressure on the CPU. Early implementations used runqueue depth to describe CPU load, but this was imprecise. Linux 3.8 introduced the PELT (Per-Entity Load Tracking) algorithm, which tracks load per scheduling entity rather than per CPU, enabling more accurate load balancing decisions.

2. What is Load Balancing

Load balancing does not simply distribute load equally across CPUs; it must consider each CPU's computing capacity. CPU capacity depends on microarchitecture and maximum frequency. The system uses normalized values where the most powerful CPU running at max frequency has a capacity of 1024. CFS task balancing uses available CPU capacity after accounting for RT and IRQ usage.

3. Scheduling Domains and Groups

Load balancing complexity arises from CPU topology. Task migration between CPUs in different clusters causes cache flush overhead, while hyperthreading shares cache making migration cheaper. NUMA node migration should be avoided unless imbalance is severe. Linux uses struct sched_domain to represent scheduling domains and struct sched_group to represent groups of CPUs within a domain.

4. Software Architecture

The load balancing module consists of core load balancing and class-specific modules. CFS, RT, and Deadline tasks have different balancing strategies. The system tracks both task load and CPU load, building a sched domain hierarchy through DTS and CPU topology subsystems. Load balancing is triggered by scheduling events like task wakeup, task creation, and tick interrupts.

5. How Load Balancing Works

Using a 4 small-core + 4 big-core processor as example: the system has two-level sched domains - MC (Multi-Core) domain at base level and DIE domain at top level. Load balancing starts from the base domain (MC domain), checking load balance among sched groups within that domain. If imbalance exists, tasks are migrated within the cluster. The algorithm only allows CPUs to pull tasks, not push. After MC domain balancing, it proceeds to DIE domain for inter-cluster migration, considering group-level load and capacity.

6. Load Balancing Scenarios

Linux kernel handles three main scenarios: (1) Load Balance - migrating runnable tasks to match CPU capacity; (2) Task Placement - selecting appropriate CPU when a blocked task wakes up; (3) Active Upmigration - migrating running misfit tasks to higher-capacity CPUs using stop machine scheduler.

Load balancing triggers include: (1) Tick load balance - triggered periodically; (2) New idle load balance - when CPU has no runnable tasks and enters idle; (3) Idle load balance - using IPI to wake idle CPUs for load balancing.

load-balancingTask MigrationLinux kernelCFS SchedulerCPU SchedulerCPU TopologyPELT AlgorithmScheduling Domain
OPPO Kernel Craftsman
Written by

OPPO Kernel Craftsman

Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.