Operations 19 min read

Understanding Linux Swap, Swappiness, and Memory Reclamation Mechanisms

This article explains how the Linux kernel manages memory reclamation, the role of kswapd, the distinction between file and anonymous pages, the impact of swap and the swappiness parameter, and provides practical guidance for tuning swap usage in production systems.

NetEase Game Operations Platform
NetEase Game Operations Platform
NetEase Game Operations Platform
Understanding Linux Swap, Swappiness, and Memory Reclamation Mechanisms

Introduction

When the system boots, various services start and processes begin consuming memory. Linux tries to keep as much memory as possible in use, but when memory becomes scarce, reclamation becomes critical, especially the swap subsystem.

When Reclamation Happens

The memory guardian kswapd attempts to satisfy process memory demands. It triggers direct reclamation when page allocation fails or when free memory falls below WMARK_MIN . If free memory is below WMARK_LOW , kswapd starts reclaiming to raise free memory back to WMARK_HIGH , after which it sleeps.

Reclamation Tasks

Memory is divided into FILE_BASE (file‑backed pages) and ANON_BASE (anonymous pages). Inactive pages are reclaimed: clean file pages can be freed immediately, dirty file pages are written back first, while anonymous pages are swapped out to disk.

Traditional Views on Swap and Swappiness

Common misconceptions include: "Swap means the system is broken", "Swap size should be twice RAM", "Swappiness defaults to 60 and higher values increase swapping", and "Setting swappiness to 0 disables swap entirely". Community discussions often claim swap usage should be zero and any swap activity indicates a problem.

Kernel Exploration

The source code in linux/mm/vmscan.c shows how the kernel decides what to scan. Key variables include vm_swappiness (default 60), scan balances (SCAN_FILE, SCAN_ANON, SCAN_EQUAL, SCAN_FRACT), and functions like global_reclaim() that determine whether the scan is triggered globally or per cgroup.

int vm_swappiness = 60;

The function get_scan_count() calculates how many pages to scan based on swappiness, recent scan statistics, and whether the system is under pressure.

Four Scan Marks

SCAN_FILE – only file pages are scanned.

SCAN_ANON – only anonymous pages are scanned.

SCAN_EQUAL – both types are scanned aggressively when priority is 0 and swappiness is non‑zero.

SCAN_FRACT – scanning proportionally to swappiness when no other condition matches.

Global Reclaim and Priority

Priority defaults to 12 and decreases by 1 each reclamation cycle; priority 0 indicates the most urgent state. global_reclaim() is usually true unless a memory‑cgroup explicitly triggers reclamation.

#ifdef CONFIG_MEMCG
static bool global_reclaim(struct scan_control *sc) {
    return !sc->target_mem_cgroup;
}
#else
static bool global_reclaim(struct scan_control *sc) { return true; }
#endif

Watermark Levels

Three watermarks guide kswapd behavior: WMARK_MIN (used after fast allocation fails), WMARK_LOW (default fast‑allocation threshold), and WMARK_HIGH (desired free pages). They are calculated from total memory and a scale factor.

zone->watermark[WMARK_MIN] = tmp;
zone->watermark[WMARK_LOW]  = min_wmark_pages(zone) + tmp;
zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;

CommitLimit and Overcommit

Memory overcommit allows the kernel to promise more memory than physically available. CommitLimit is computed as:

CommitLimit = (total RAM pages - total huge TLB pages) * overcommit_ratio / 100 + total swap pages;

Thus, swap size directly influences how much memory can be over‑committed, affecting system throughput.

Practical Recommendations

Use multiple swap partitions on separate disks to improve I/O throughput.

Consider ZSWAP to compress swapped pages in RAM before writing to disk.

Leverage cgroups to apply different swappiness settings per workload.

Setting swappiness to 0 does not disable swap; to remove swap entirely, delete the swap partition.

References

tolimit Blog – Memory source‑code analysis

Arnold Lu – Memory management articles

Zhihu discussion on Linux kernel differences

V2EX thread on swap misconceptions

PerformanceMemory ManagementKernelLinuxswapswappiness
NetEase Game Operations Platform
Written by

NetEase Game Operations Platform

The NetEase Game Automated Operations Platform delivers stable services for thousands of NetEase titles, focusing on efficient ops workflows, intelligent monitoring, and virtualization.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.