Fundamentals 60 min read

Why Linux Kernel Memory Layout Is the Hidden Key to Preventing OOM Crashes

This article reveals how Linux kernel memory layout—its partitions, address allocation, and resource scheduling—directly impacts system stability, explains the roles of each memory region, demonstrates common pitfalls like fragmentation and dentry leaks, and provides practical debugging and optimization techniques for developers and operators.

Deepin Linux
Deepin Linux
Deepin Linux
Why Linux Kernel Memory Layout Is the Hidden Key to Preventing OOM Crashes

When a Linux system suffers OOM crashes or becomes unresponsive, many first inspect application code, but the underlying kernel memory layout often holds the key to stability because it defines how memory is partitioned, addressed, and scheduled.

1. What Is Linux Kernel Memory Layout

The kernel divides the entire address space into user space and kernel space. User space hosts independent processes like separate rooms, while kernel space contains core code, data, and shared resources like communal facilities.

Key kernel regions include:

Vector table (vector) : stores entry addresses for exception handlers, acting as an emergency command center.

Fixed mapping area (fixmap) : holds permanent mappings for special physical addresses, similar to fixed parking spots.

High memory mapping area (vmalloc) : provides virtually contiguous but physically non‑contiguous memory, like a flexible storage room.

Linear mapping area (lowmem) : directly maps a portion of physical memory; it contains the .text, .init, .data, and .bss sections, the kernel’s “command center”.

Persistent kernel mapping area (pkmap) : bridges high memory pages to the kernel.

Module area (modules) : where dynamically loaded kernel modules reside.

1.2 Address‑Space Division and Architecture Design

In 64‑bit Linux, user space spans 0x0000_0000_0000_0000‑0x0000_FFFF_FFFF_FFFF, while kernel space occupies 0xFFFF_0000_0000_0000‑0xFFFF_FFFF_FFFF_FFFF, a design that enforces security isolation and performance benefits.

Security: user processes cannot directly access kernel addresses; a stray pointer crossing the boundary can cause a panic, as illustrated by an embedded device example.

Performance: kernel code runs with privileged access, e.g., on ARM64 the PAGE_OFFSET constant enables fast physical‑address translation, while user processes must use system calls.

ARM64 vs. ARM32:

ARM64 provides a 128 TB linear mapping region and a large vmalloc area (~126 TB), simplifying memory access for large buffers such as DMA.

ARM32’s 1 GB kernel space reserves only 768 MB for lowmem; the remaining high memory requires pkmap, and heavy vmalloc usage can quickly exhaust free blocks, leading to kmalloc failures.

2. User‑Space Memory Layout

User‑space memory is organized into several segments that mirror a well‑planned house:

.text (code segment) : read‑only compiled instructions; analogous to a protected manuscript.

.data (initialized data) : stores global/static variables with initial values.

.bss (uninitialized data) : holds zero‑initialized globals, saving disk space.

Heap : dynamic allocation via malloc, growing upward; used for variable‑size data.

Stack : stores local variables and call frames, growing downward.

Memory‑mapping segment : created with mmap for file or library mapping.

Example of a simple C program and its placement in .text, .data, and .bss is shown with #include <stdio.h> and corresponding printf calls.

Dynamic library workflow:

Write library source (e.g., math_lib.cpp defining add and multiply).

Compile with g++ -fPIC -shared -o libmath.so math_lib.cpp.

Load at runtime using dlopen, resolve symbols with dlsym, call functions, then dlclose.

Link the main program with -ldl and run.

When using mmap, permissions must not exceed the file’s open mode, and MAP_SHARED writes back to the file while MAP_PRIVATE uses copy‑on‑write.

3. Kernel‑Space Memory Layout (32‑bit Systems)

3.1 Linear Mapping Area (lowmem)

Starts at 0xC0000000 and covers low physical memory (default 768 MB). Critical structures such as task_struct and mm_struct reside here, providing fast direct access without page‑table walks.

3.2 vmalloc Dynamic Mapping Area

Extends to ~0xEF800000, about 240 MB, offering virtually contiguous addresses for non‑contiguous physical pages. Used for DMA buffers, dynamically loaded modules, and situations where physical contiguity is unnecessary.

3.3 High Memory Region

When physical memory exceeds the lowmem limit, high memory is accessed via alloc_page(..._GFP_HIGHMEM) and mapped temporarily with kmap. Example: a large matrix in a scientific computation may be placed in high memory.

3.4 Fixed Mapping (fixmap)

Reserved from 0xFFF00000 to 0xFFFE0000 (896 KB). Used early in boot before the MMU is fully set up, for temporary page‑table operations, I/O register mapping, and inter‑CPU communication buffers.

3.5 Persistent Kernel Mapping (pkmap)

Maps high‑memory pages for long‑term kernel access, maintaining a mapping table. Useful for database kernels that need stable access to large data structures stored in high memory.

4. Important Mechanisms

4.1 Page Tables and Address Translation

Linux uses a four‑level page‑table hierarchy (PGD → PUD → PMD → PT) to translate virtual to physical addresses. Each process has its own tables, ensuring isolation and allowing per‑page permissions (read‑only, executable, etc.).

4.2 Memory Allocation and Reclamation

Two primary allocators: kmalloc: allocates small, physically contiguous blocks via the slab allocator. Example: kmalloc(1024, GFP_KERNEL) for a network driver buffer. vmalloc: allocates larger, virtually contiguous memory that may be physically fragmented. Example: vmalloc(1024*1024) for a large kernel module.

When memory is tight, the kernel reclaims pages: anonymous pages are swapped out, clean file pages are dropped, and dirty pages are written back before being freed.

5. Layout Imbalance: How Memory Management Can Trigger System Shock

5.1 Fragmentation – The Invisible Killer

External fragmentation creates non‑contiguous free blocks that cannot satisfy large allocation requests, even though total free memory is sufficient. The buddy system mitigates this by splitting and merging power‑of‑two blocks, but heavy small‑allocation workloads can still exhaust contiguous space, causing OOM kills.

5.2 Address‑Space Isolation Failures

If a user‑mode bug overwrites kernel data (e.g., via a buffer overflow), critical structures like the scheduler queue or file‑system metadata can be corrupted, leading to crashes or security breaches such as rootkits.

5.3 Kernel Module Loading Conflicts

Modules loaded into overlapping address ranges cause page faults and panics. Properly reserving module load zones prevents such conflicts.

6. Real‑World Cases

6.1 dentry Leak in a Load‑Balancer Service

A high‑traffic LB cluster experienced memory explosion; slabtop showed the dentry cache consuming 60% of RAM. The root cause was curl 7.19.7 using NSS, which failed to call dput() after dget(), leaving dentry objects unreleased.

Sample C++ implementation of the dentry lifecycle (class Dentry) and the faulty NSS library, curl wrapper, and LB service loop are provided. The main program demonstrates the leak, prints the cache size, manually releases one entry, and shows the reduction.

6.2 Fragmentation‑Induced Crash on an Embedded ARM32 Device

An industrial controller repeatedly called vmalloc for network buffers. Over time the buddy allocator’s fragmentation rate exceeded 80%, preventing a required 128 KB contiguous allocation, which triggered a kernel panic similar to “out of memory”.

The example includes a BuddyAllocator class, a NetworkStack that allocates non‑contiguous buffers, and an IndustrialController that runs tasks and finally attempts a contiguous allocation, illustrating the failure.

7. Defense Strategies – From Design to Dynamic Tuning

7.1 Preventive Design

Prefer stack or static allocations over frequent kmalloc / vmalloc.

Match allocation and free scopes to avoid long‑lived small blocks.

Allocate sizes as powers of two to cooperate with the buddy system.

Kernel parameters such as /proc/sys/vm/max_map_count limit per‑process VMAs, and vm.overcommit_memory=2 restricts over‑commitment, reducing uncontrolled vmalloc growth.

7.2 Dynamic Monitoring

slabtop

– real‑time view of slab caches (e.g., dentry, buffer_head). cat /proc/iomem – shows kernel region allocations to spot address‑space conflicts. perf mem – tracks page faults and TLB misses. ftrace on __alloc_pages – finds hot allocation call‑stacks. crash – post‑mortem analysis of mm_struct and page flags. kasan – detects out‑of‑bounds and use‑after‑free bugs in the kernel.

7.3 Architecture‑Level Optimizations

NUMA‑aware allocation using numactl and enabling CONFIG_NUMA to keep memory local to CPUs.

Hugepages (2 MB/1 GB) reduce page‑table entries and TLB pressure, dramatically lowering fragmentation and latency for databases or Redis.

By combining careful layout design, kernel parameter tuning, continuous monitoring, and architecture‑specific features, operators can keep Linux systems stable even under heavy memory pressure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Memory ManagementLinuxOOMbuddy systemkmallocvmallocKernel Memory
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.