Understanding Linux Memory Management: Nodes, Zones, Buddy System, and SLAB Allocator
This article explains the Linux kernel memory management hierarchy—including NUMA nodes, memory zones, the buddy system for free pages, and the SLAB allocator—providing command‑line examples, code snippets, and visual diagrams to illustrate how the kernel efficiently allocates and reclaims memory.
This article explains the Linux kernel memory management hierarchy—nodes, zones, the buddy system, and the SLAB allocator—based on Linux 3.10.0, and shows how to inspect and interpret the relevant data using standard commands and source code.
1. NODE Partition
Modern servers use a NUMA architecture where each CPU socket and its directly attached memory form a node . The dmidecode command can list CPU details and memory modules, revealing the layout of each node.
Processor Information //第一颗CPU
SocketDesignation: CPU1
Version: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Core Count: 8
Thread Count: 16
Processor Information //第二颗CPU
Socket Designation: CPU2
Version: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Core Count: 8
Thread Count: 16Memory modules can be listed similarly, showing which CPU they are attached to.
//CPU1 上总共插着四条内存
Memory Device
Size: 16384 MB
Locator: CPU1 DIMM A1
Memory Device
Size: 16384 MB
Locator: CPU1 DIMM A2
......
//CPU2 上也插着四条
Memory Device
Size: 16384 MB
Locator: CPU2 DIMM E1
Memory Device
Size: 16384 MB
Locator: CPU2 DIMM F1
......Each CPU together with its directly connected memory constitutes a node . The numactl --hardware command displays the nodes, their CPUs, and memory sizes.
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65419 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB2. ZONE Partition
Each node is further divided into zones , which are contiguous ranges of physical memory. Common zones are:
ZONE_DMA – the lowest memory region used by ISA devices for DMA.
ZONE_DMA32 – supports 32‑bit DMA devices on 64‑bit systems.
ZONE_NORMAL – all remaining memory on x86‑64 systems.
ZONE_HIGHMEM is omitted because it belongs to the legacy 32‑bit era.
Pages (typically 4 KB each) are allocated from zones. The /proc/zoneinfo file shows the number of free and managed pages per zone.
# cat /proc/zoneinfo
Node 0, zone DMA
pages free 3973
managed 3973
Node 0, zone DMA32
pages free 390390
managed 427659
Node 0, zone Normal
pages free 15021616
managed 15990165
Node 1, zone Normal
pages free 16012823
managed 16514393Multiplying the page count by 4 KB yields the zone size (e.g., Node 1 Normal zone ≈ 66 GB).
3. Buddy System for Free Pages
The kernel represents each zone with struct zone , whose free_area array implements the buddy system. MAX_ORDER is 11, giving free lists for blocks of 4 KB up to 4 MB.
//file: include/linux/mmzone.h
#define MAX_ORDER 11
struct zone {
free_area free_area[MAX_ORDER];
......
}The cat /proc/pagetypeinfo command shows the number of free blocks of each order.
# cat /proc/pagetypeinfo
Node 0, zone DMA
pages free 3973
managed 3973
...Allocation is performed via alloc_pages(gfp_mask, order) , which searches the appropriate free list.
struct page * alloc_pages(gfp_t gfp_mask, unsigned int order)4. SLAB Allocator
While the buddy system works with whole pages, many kernel objects are much smaller. The SLAB (or SLUB) allocator builds on the buddy system to manage caches of objects of a specific size, reducing fragmentation.
//file: include/linux/slab_def.h
struct kmem_cache {
struct kmem_cache_node **node
......
}
//file: mm/slab.h
struct kmem_cache_node {
struct list_head slabs_partial;
struct list_head slabs_full;
struct list_head slabs_free;
......
}Each cache maintains three lists (partial, full, free) of slabs, where a slab consists of one or more pages containing objects of identical size.
//file: mm/slab.c
static void *kmem_getpages(struct kmem_cache *cachep,
gfp_t flags, int nodeid)
{
...
flags |= cachep->allocflags;
if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
flags |= __GFP_RECLAIMABLE;
page = alloc_pages_exact_node(nodeid, ...);
...
}
//file: include/linux/gfp.h
static inline struct page *alloc_pages_exact_node(int nid,
gfp_t gfp_mask, unsigned int order)
{
return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
}The kernel exposes cache statistics via /proc/slabinfo and the slabtop command. Important fields are objsize (object size), objperslab (objects per slab), and pagesperslab (pages per slab).
Typical SLAB API functions include:
kmem_cache_create : create a new cache.
kmem_cache_alloc : allocate an object from a cache.
kmem_cache_free : return an object to its cache.
Summary
The Linux kernel first partitions memory into NUMA nodes, then into zones, manages free pages with the buddy system, and finally uses the SLAB allocator to efficiently serve small object allocations, achieving high performance and low fragmentation.
Refining Core Development Skills
Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.