Fundamentals 11 min read

Linux Scheduler: Structures, Scheduling Classes, Runqueue, and Context Switch Process

This article explains Linux scheduling fundamentals, describing the task_struct fields, scheduling classes, runqueue organization, the scheduling workflow including flag setting and execution, and the details of the context_switch function that performs address‑space and register state switches.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
Linux Scheduler: Structures, Scheduling Classes, Runqueue, and Context Switch Process

What is scheduling? According to a scheduling algorithm, a process is selected from the ready queue to run on the CPU.

Why schedule? To maximize CPU utilization.

Scheduling Related Structures

task_struct

We extract the scheduling‑related fields from task_struct :

struct task_struct {
......
/*
* Scheduling class abstraction using
sched_class
* Stop scheduler: stop_sched_class
* Deadline scheduler: dl_sched_class
* RT scheduler: rt_sched_class
* CFS scheduler: cfs_sched_class
* IDLE‑Task scheduler: idle_sched_class
*/
const struct sched_class *sched_class;
// CFS scheduling entity
struct sched_entity  se;
// RT scheduling entity
struct sched_rt_entity  rt;
......
#ifdef CONFIG_CGROUP_SCHED
// task group (each CPU maintains its own CFS and RT entities/queues)
struct task_group  *sched_task_group;
#endif
// DL scheduling entity
struct sched_dl_entity  dl;
......
/*
* Process scheduling policy, six kinds:
* SCHED_DEADLINE – deadline scheduler
* SCHED_FIFO, SCHED_RR – real‑time scheduler
* SCHED_NORMAL, SCHED_BATCH, SCHED_IDLE – CFS scheduler
*/
unsigned int   policy;
......
}

struct sched_class abstracts the scheduler into five classes:

Stop scheduler: highest priority, can preempt all other processes and cannot be preempted.

Deadline scheduler: uses a red‑black tree to order processes by absolute deadline and selects the earliest.

RT scheduler: maintains a separate queue for each priority level.

CFS scheduler: employs the Completely Fair Scheduler algorithm with virtual runtime.

IDLE‑Task scheduler: each CPU has an idle thread that runs when no other process is runnable.

unsigned int policy – the scheduling policy of a process (six options):

SCHED_DEADLINE – selects the Deadline scheduler.

SCHED_RR – round‑robin time‑slice scheduling; after its slice expires the process is placed at the tail of its priority queue.

SCHED_FIFO – first‑in‑first‑out, no time slice, runs until it voluntarily yields or a higher‑priority task appears.

SCHED_NORMAL – selects the CFS scheduler for normal processes.

SCHED_BATCH – batch processing, also uses the CFS scheduler.

SCHED_IDLE – runs with the lowest priority under the CFS scheduler.

struct sched_entity se – scheduling entity for ordinary non‑real‑time processes using CFS.

struct sched_rt_entity rt – scheduling entity for real‑time processes using Round‑Robin or FIFO.

struct sched_dl_entity dl – scheduling entity for real‑time processes using EDF.

Tasks assigned to a CPU become scheduling entities and are inserted into the appropriate runqueue.

runqueue (Running Queue)

struct rq {
......
// Three scheduling queues: CFS, RT, DL
struct cfs_rq cfs;
struct rt_rq rt;
struct dl_rq dl;
......
// idle points to the idle kernel thread, stop points to the migration thread
struct task_struct *curr, *idle, *stop;
......
}

The three scheduling queues are:

struct cfs_rq cfs – CFS scheduling queue.

struct rt_rq rt – RT scheduling queue.

struct dl_rq dl – DL scheduling queue.

Each CPU has its own runqueue, and each runqueue contains the three scheduling queues; tasks are added to the corresponding queue as scheduling entities.

Scheduling Flow

The essence of scheduling is to choose the next process to run, which consists of two steps:

1. Set scheduling flag

The kernel sets the TIF_NEED_RESCHED flag in the flags member of the thread_info structure of the currently running process.

When is TIF_NEED_RESCHED set?

During the scheduler tick (clock interrupt).

When a process is woken up ( wake_up_process ).

When a new process is created ( do_fork ).

During load balancing ( smp_send_reschedule ).

When the nice value of a process is changed ( set_user_nice ).

In all these cases resched_curr sets the TIF_NEED_RESCHED flag. For example, during scheduler_tick and wake_up_process :

The exact criteria for setting the flag depend on the specific scheduling algorithm and will be discussed when each scheduler is covered.

2. Execute scheduling

The kernel checks whether the current process has the TIF_NEED_RESCHED flag; if so, it calls the schedule function to perform a context switch. Preemption can occur in kernel mode or user mode.

User‑mode preemption

ret_to_user is invoked after system calls, exceptions, or interrupt handling to return to user space.

Kernel‑mode preemption

Process Context Switch (context_switch)

The actual scheduling work happens inside the _schedule function.

The two key functions are pick_next_task , which selects the next task, and context_switch , which performs the actual switch.

Choosing a task depends on the scheduling class; that will be explained later. Here we focus on context_switch , which involves two main steps: switching the process address space and switching the processor state.

Process address‑space switch

The next process's page‑global directory (PGD) virtual address is written to ttbr0_el1 , the user‑space page‑table base register on ARM64. The MMU then uses this register to translate user‑space virtual addresses to physical addresses, completing the virtual address‑space switch.

Processor register state switch

On ARM64, registers x19‑x28, fp, sp, and pc must be saved. During the switch, the previous task's ( prev ) registers are stored in its cpu_context field of the task_struct , and the next task's ( next ) registers are restored from its cpu_context . The address of the next task's task_struct is placed in sp_el0 so that current can locate the running task, completing the processor state transition.

本文转载自公众号「人人都是极客」

KernelSchedulerlinuxOperating Systemcontext switchRunqueueScheduling Classes
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.