Fundamentals 27 min read

Understanding Linux Memory Mapping (mmap): API, Implementation, and Use Cases

This article explains Linux memory mapping (mmap), covering its purpose, API parameters, different mapping types, internal kernel implementation, page‑fault handling, copy‑on‑write semantics, practical use cases, and includes a complete Objective‑C example demonstrating file mapping and manipulation.

Deepin Linux
Deepin Linux
Deepin Linux
Understanding Linux Memory Mapping (mmap): API, Implementation, and Use Cases

Overview

Memory mapping is an OS technique that maps a file or device directly into a process's address space, allowing the process to read and write the data as if it were regular memory. It eliminates explicit read/write calls and keeps the mapped region synchronized with the underlying file.

Typical scenarios include handling large files, inter‑process communication, and improving I/O efficiency in network programming.

1. mmap API

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

The function creates a new mapping and returns the starting virtual address. If addr is non‑NULL, the mapping starts at that address; otherwise the kernel chooses a free region.

length specifies the size of the region. prot defines read/write/execute permissions. fd determines whether the mapping is file‑backed (fd > 0) or anonymous (fd = -1). flags indicate sharing mode (e.g., MAP_SHARED or MAP_PRIVATE ) and other attributes.

Combining fd and flags yields four mapping types: shared file, private file, shared anonymous, and private anonymous.

2. Implementation Details

The mmap workflow consists of three main steps:

Obtain an unmapped virtual area with get_unmapped_area .

Set appropriate vm_flags based on file‑backed vs. anonymous and shared vs. private.

Call mmap_region to allocate a vm_area_struct (VMA) and link it into the process's red‑black tree of VMAs.

The kernel does not allocate physical pages at this point; it only records the process's demand for memory. Actual pages are provided lazily on a page‑fault.

3. Page‑Fault Handling

When a process accesses an unmapped page, the CPU raises a page‑fault and the kernel enters do_page_fault . It locates the relevant VMA, checks access permissions, and then calls handle_mm_fault , which eventually invokes handle_pte_fault .

handle_pte_fault distinguishes several cases:

If the PTE is not present and is pte_none , it handles anonymous pages ( do_anonymous_page ) or file‑backed pages ( do_linear_fault ).

If the PTE encodes a swap entry, it calls do_swap_page to swap the page back in.

If the fault is caused by a write to a read‑only COW page, it triggers do_wp_page to perform copy‑on‑write.

For file‑backed mappings, the VMA’s vm_ops is set to generic_file_vm_ops , whose fault method points to filemap_fault . This routine loads the required file data into memory.

4. Copy‑On‑Write (COW)

During fork , the parent and child share the same physical pages, which are marked read‑only. When either process writes to a shared page, a page‑fault occurs, and do_wp_page creates a private copy, allowing the processes to diverge.

5. Example Code (Objective‑C)

// ViewController.m
// TestCode
// Created by zhangdasen on 2020/5/24.

#import "ViewController.h"
#import
#import
@interface ViewController ()
@end

@implementation ViewController

- (void)viewDidLoad {
    [super viewDidLoad];
    NSString *path = [NSHomeDirectory() stringByAppendingPathComponent:@"test.data"];
    NSLog(@"path: %@", path);
    NSString *str = @"test str2";
    [str writeToFile:path atomically:YES encoding:NSUTF8StringEncoding error:nil];
    ProcessFile(path.UTF8String);
    NSString *result = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:nil];
    NSLog(@"result:%@", result);
}

int MapFile(const char *inPathName, void **outDataPtr, size_t *outDataLength, size_t appendSize) {
    int outError = 0;
    int fileDescriptor;
    struct stat statInfo;
    *outDataPtr = NULL;
    *outDataLength = 0;
    fileDescriptor = open(inPathName, O_RDWR, 0);
    if (fileDescriptor < 0) {
        outError = errno;
    } else {
        if (fstat(fileDescriptor, &statInfo) != 0) {
            outError = errno;
        } else {
            ftruncate(fileDescriptor, statInfo.st_size + appendSize);
            fsync(fileDescriptor);
            *outDataPtr = mmap(NULL, statInfo.st_size + appendSize,
                               PROT_READ|PROT_WRITE,
                               MAP_FILE|MAP_SHARED,
                               fileDescriptor, 0);
            if (*outDataPtr == MAP_FAILED) {
                outError = errno;
            } else {
                *outDataLength = statInfo.st_size;
            }
        }
        close(fileDescriptor);
    }
    return outError;
}

void ProcessFile(const char *inPathName) {
    size_t dataLength;
    void *dataPtr;
    char *appendStr = " append_key2";
    int appendSize = (int)strlen(appendStr);
    if (MapFile(inPathName, &dataPtr, &dataLength, appendSize) == 0) {
        dataPtr = dataPtr + dataLength;
        memcpy(dataPtr, appendStr, appendSize);
        munmap(dataPtr, appendSize + dataLength);
    }
}

@end

The example demonstrates mapping a file, appending data via the mapped region, and then unmapping.

6. Kernel Data Structures Involved

Key structures include file , dentry , inode , and address_space . The inode’s i_mapping points to an address_space that holds a radix tree of page objects, forming the PageCache. Shared mappings use the same physical pages via these structures.

Swap management is represented by struct swap_info_struct , which tracks swap devices, slot counts, and usage maps. A swap entry ( swp_entry_t ) encodes the swap device index and offset, allowing the kernel to retrieve swapped‑out pages.

Conclusion

Linux’s mmap mechanism provides a powerful, lazy‑loaded way to access file or anonymous memory, enabling efficient I/O, inter‑process communication, and memory‑conserving techniques such as copy‑on‑write. Understanding the API, kernel pathways, and page‑fault handling is essential for systems programmers and performance‑critical application developers.

MMAPsystem callLinux Kernelpage faultcopy-on-writememory-mappingVMA
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.