Fundamentals 41 min read

DPDK Memory Management: Architecture, Hugepage Initialization, and Allocation Mechanisms

This article explains DPDK's memory management architecture, covering the hierarchical memory layout, hugepage discovery and mapping, shared configuration structures, NUMA‑aware allocation, custom malloc‑heap implementation, memzone and mempool creation, and the mbuf buffer model, with detailed code examples.

Deepin Linux

Jul 22, 2023

DPDK Memory Management: Architecture, Hugepage Initialization, and Allocation Mechanisms

Introduction : The article consolidates previous notes on DPDK 17.11 source code, focusing on the memory‑management subsystem that underpins high‑performance packet processing.

1. Overview

DPDK's memory management is a core component that enables other DPDK modules and user applications to achieve optimal performance. It relies on hugepages, shared memory, and NUMA‑aware structures, primarily on Linux.

The memory hierarchy consists of three layers created during rte_eal_init and three layers created later by the user via API calls. Each layer provides specific APIs for upper layers or applications.

2. Standard Hugepages

Modern CPUs manage memory in pages rather than bytes. On Intel® 64 and IA‑32 architectures the default page size is 4 KB, but DPDK uses hugepages (2 MB or 1 GB) to reduce TLB misses and improve throughput.

DPDK collects available hugepages by scanning /sys/kernel/mm/hugepages and creates a shared configuration file /var/run/.rte_config that stores a struct rte_mem_config structure.

2.1 eal_hugepage_info_init – Collecting Hugepages

int eal_hugepage_info_init(void)
{
    const char dirent_start_text[] = "hugepages-";
    const size_t dirent_start_len = sizeof(dirent_start_text) - 1;
    unsigned i, num_sizes = 0;
    DIR *dir;
    struct dirent *dirent;
    dir = opendir(sys_dir_path);
    for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) {
        struct hugepage_info *hpi;
        if (strncmp(dirent->d_name, dirent_start_text, dirent_start_len) != 0)
            continue;
        if (num_sizes >= MAX_HUGEPAGE_SIZES)
            break;
        hpi = &internal_config.hugepage_info[num_sizes];
        hpi->hugepage_sz = rte_str_to_size(&dirent->d_name[dirent_start_len]);
        hpi->hugedir = get_hugepage_dir(hpi->hugepage_sz);
        if (hpi->hugedir == NULL) {
            uint32_t num_pages = get_num_hugepages(dirent->d_name);
            if (num_pages > 0)
                RTE_LOG(NOTICE, EAL, "%" PRIu32 " hugepages of size %" PRIu64 " reserved, but no mounted hugetlbfs found for that size
", num_pages, hpi->hugepage_sz);
            continue;
        }
        hpi->lock_descriptor = open(hpi->hugedir, O_RDONLY);
        flock(hpi->lock_descriptor, LOCK_EX);
        clear_hugedir(hpi->hugedir);
        hpi->num_pages[0] = get_num_hugepages(dirent->d_name);
        num_sizes++;
    }
    closedir(dir);
    internal_config.num_hugepage_sizes = num_sizes;
    qsort(&internal_config.hugepage_info[0], num_sizes, sizeof(internal_config.hugepage_info[0]), compare_hpi);
    for (i = 0; i < num_sizes; i++)
        if (internal_config.hugepage_info[i].hugedir != NULL && internal_config.hugepage_info[i].num_pages[0] > 0)
            return 0;
    return -1;
}

3. Configuration Mapping

rte_config_init – Mapping Shared Config

static void rte_config_init(void)
{
    rte_config.process_type = internal_config.process_type;
    switch (rte_config.process_type) {
    case RTE_PROC_PRIMARY:
        rte_eal_config_create();
        break;
    case RTE_PROC_SECONDARY:
        rte_eal_config_attach();
        rte_eal_mcfg_wait_complete(rte_config.mem_config);
        rte_eal_config_reattach();
        break;
    default:
        rte_panic("Invalid process type
");
    }
}

The primary process creates the shared memory file, writes the struct rte_mem_config into it, and stores the virtual address ( mem_cfg_addr) so that secondary processes can map the same region.

rte_eal_memory_init – Mapping Hugepages

int rte_eal_memory_init(void)
{
    const int retval = rte_eal_process_type() == RTE_PROC_PRIMARY ?
        rte_eal_hugepage_init() : rte_eal_hugepage_attach();
    if (retval < 0)
        return -1;
    if (internal_config.no_shconf == 0 && rte_eal_memdevice_init() < 0)
        return -1;
    return 0;
}

The primary process maps each hugepage, determines its physical address, NUMA socket, and then remaps them to obtain contiguous virtual‑physical pairs. Unneeded pages are unmapped and the final layout is stored in memseg[] structures.

rte_eal_memzone_init – Initialising Malloc Heap

int rte_eal_memzone_init(void)
{
    struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
    if (rte_eal_process_type() == RTE_PROC_SECONDARY)
        return 0;
    const struct rte_memseg *memseg = rte_eal_get_physmem_layout();
    if (memseg == NULL)
        return -1;
    rte_rwlock_write_lock(&mcfg->mlock);
    mcfg->memzone_cnt = 0;
    memset(mcfg->memzone, 0, sizeof(mcfg->memzone));
    rte_rwlock_write_unlock(&mcfg->mlock);
    return rte_eal_malloc_heap_init();
}

The heap is built by inserting each memseg into a per‑socket malloc_heap. The first and last elements of each segment are special malloc_elem structures that guard against overflow.

malloc_heap_add_memseg – Adding Segments to the Heap

static void malloc_heap_add_memseg(struct malloc_heap *heap, struct rte_memseg *ms)
{
    struct malloc_elem *start_elem = (struct malloc_elem *)ms->addr;
    struct malloc_elem *end_elem = RTE_PTR_ADD(ms->addr, ms->len - MALLOC_ELEM_OVERHEAD);
    end_elem = RTE_PTR_ALIGN_FLOOR(end_elem, RTE_CACHE_LINE_SIZE);
    const size_t elem_size = (uintptr_t)end_elem - (uintptr_t)start_elem;
    malloc_elem_init(start_elem, heap, ms, elem_size);
    malloc_elem_mkend(end_elem, start_elem);
    malloc_elem_free_list_insert(start_elem);
    heap->total_size += elem_size;
}

4. Memory Allocation APIs

DPDK does not use the standard malloc(). Instead it provides its own allocator that works on hugepages, guarantees cache‑line alignment, and is NUMA‑aware.

malloc_heap_alloc – Core Allocation Function

void *malloc_heap_alloc(struct malloc_heap *heap, const char *type __attribute__((unused)),
        size_t size, unsigned flags, size_t align, size_t bound)
{
    size = RTE_CACHE_LINE_ROUNDUP(size);
    align = RTE_CACHE_LINE_ROUNDUP(align);
    rte_spinlock_lock(&heap->lock);
    struct malloc_elem *elem = find_suitable_element(heap, size, flags, align, bound);
    if (elem != NULL) {
        elem = malloc_elem_alloc(elem, size, align, bound);
        heap->alloc_count++;
    }
    rte_spinlock_unlock(&heap->lock);
    return elem == NULL ? NULL : (void *)(&elem[1]);
}

rte_memzone_reserve – Reserving a Named Zone

const struct rte_memzone *rte_memzone_reserve(const char *name, size_t len,
        int socket_id, unsigned flags)
{
    return rte_memzone_reserve_thread_safe(name, len, socket_id,
        flags, RTE_CACHE_LINE_SIZE, 0);
}

This function allocates a contiguous block from the heap on a specific NUMA node and registers it under a user‑defined name.

rte_mempool_create – Fixed‑Size Object Pools

DPDK provides a high‑performance mempool library for objects of identical size (e.g., packet buffers). The mempool is built on top of the generic allocator and can span multiple memzones if a single zone cannot satisfy the request.

rte_pktmbuf_pool_create – Creating an mbuf Pool

struct rte_mempool *rte_pktmbuf_pool_create(const char *name, unsigned n,
    unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
    int socket_id)
{
    unsigned elt_size = sizeof(struct rte_mbuf) + priv_size + data_room_size;
    struct rte_mempool *mp = rte_mempool_create_empty(name, n, elt_size,
        cache_size, sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
    if (mp == NULL)
        return NULL;
    const char *mp_ops_name = rte_eal_mbuf_default_mempool_ops();
    if (rte_mempool_set_ops_byname(mp, mp_ops_name, NULL) != 0) {
        rte_mempool_free(mp);
        return NULL;
    }
    rte_pktmbuf_pool_init(mp, &mbp_priv);
    if (rte_mempool_populate_default(mp) < 0) {
        rte_mempool_free(mp);
        return NULL;
    }
    rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
    return mp;
}

The pool stores struct rte_mbuf objects, each containing metadata, optional private data, and a data buffer for packet payloads.

rte_pktmbuf_alloc – Getting an mbuf from the Pool

static inline struct rte_mbuf *rte_pktmbuf_alloc(struct rte_mempool *mp)
{
    struct rte_mbuf *m;
    if ((m = rte_mbuf_raw_alloc(mp)) != NULL)
        rte_pktmbuf_reset(m);
    return m;
}

The allocation first checks the per‑CPU cache; if empty it falls back to the shared pool.

5. Summary

DPDK's memory subsystem combines hugepage‑based physical memory, shared configuration files, NUMA‑aware segment layout, and a custom malloc‑heap to deliver deterministic, low‑latency allocation. The design enables zero‑copy packet processing across primary and secondary processes, supports IOMMU/IOVA translation, and provides high‑throughput packet buffers via the mbuf mempool infrastructure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

memory management shared memory NUMA hugepages malloc heap

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.