Fundamentals 45 min read

Design and Implementation of a High‑Performance C++ Memory Pool

This article explains the problems of frequent new/delete in C++, introduces memory‑pool concepts, details design principles, outlines a multi‑layer architecture (Thread Cache, Central Cache, Page Cache), provides a complete C++ implementation compatible with std::allocator, and discusses common pitfalls and performance considerations.

Deepin Linux

Dec 28, 2024

Design and Implementation of a High‑Performance C++ Memory Pool

1. Introduction

In C++ programming, frequent memory allocation and deallocation with new and delete incurs heavy CPU overhead and causes memory fragmentation, which can severely degrade performance in real‑time graphics engines or high‑frequency trading systems.

1.1 The Memory‑Management Dilemma

Using the default heap allocator requires the OS to search free blocks, possibly split them, and update internal tables—operations comparable to a courier searching a chaotic warehouse for a package.

Fragmentation (both internal and external) leaves unusable gaps between allocated blocks, reducing overall memory utilization and slowing down subsequent large allocations.

Memory leaks occur when allocated memory is never released, eventually exhausting system resources and causing crashes.

1.2 Memory Pools as a Remedy

A memory pool pre‑allocates a large contiguous region at program start and divides it into fixed‑size chunks, allowing fast allocation by simply handing out a chunk without invoking the OS.

This eliminates fragmentation, improves allocation speed, and provides a deterministic reclamation mechanism that reduces leaks and dangling pointers.

2. Memory‑Pool Technical Details

2.1 What Is a Memory Pool?

A memory pool (or object pool) partitions memory into equal‑sized blocks called "pools". Allocation retrieves a block from the pool; deallocation returns it to the pool.

Advantages include reduced allocation overhead, lower fragmentation, and higher cache locality. Drawbacks are limited support for large or variable‑size allocations and management overhead when the pool must grow or shrink.

2.2 Why Use a Memory Pool?

Fragmentation : Fixed‑size blocks avoid internal and external fragmentation.

Allocation Efficiency : Pre‑allocated blocks bypass costly system calls.

Leak Prevention : Centralized tracking makes it easier to detect unreleased memory.

3. C++ Memory Management Mechanisms

3.1 Allocation and Deallocation

Typical allocation with int* p = new int; requests memory from the OS, which may be slow. Deallocation with delete p; returns the memory, but both operations involve system calls and can fail if memory is exhausted.

3.2 Problems

Memory fragmentation

Memory leaks

Dangling pointers

Allocation/deallocation overhead

3.3 How a Memory Pool Solves These Problems

Fixed‑size blocks eliminate fragmentation.

Fast allocation because blocks are taken from a pre‑allocated list.

Centralized bookkeeping reduces leaks and dangling pointers.

4. Memory‑Pool Design Principles

Minimize the number of system allocations by pre‑allocating large chunks.

Reduce fragmentation by using fixed‑size blocks.

Provide fast lookup (e.g., free‑list or bitmap) for available blocks.

Support flexible operations such as growth, shrinkage, and thread‑safety.

5. Implementation Plan

5.1 Core Structure

The pool consists of Memory Blocks and a Memory Block List (a doubly‑linked list of blocks).

5.2 Working Principle

At startup the pool reserves a large region, splits it into fixed‑size blocks, and maintains a free‑list. Allocation removes a block from the list; deallocation returns it to the head of the list.

5.3 Thread‑Safety

In multithreaded environments a mutex (or lock‑free techniques) protects the free‑list during concurrent accesses.

5.4 C++ Implementation (excerpt)

#ifndef PPX_BASE_MEMORY_POOL_H_
#define PPX_BASE_MEMORY_POOL_H_

#include <climits>
#include <cstddef>
#include <mutex>

namespace ppx {
    namespace base {
        template <typename T, size_t BlockSize = 4096, bool ZeroOnDeallocate = true>
        class MemoryPool {
        public:
            typedef T               value_type;
            typedef T*              pointer;
            typedef T&              reference;
            typedef const T*        const_pointer;
            typedef const T&        const_reference;
            typedef size_t          size_type;
            typedef ptrdiff_t       difference_type;
            typedef std::false_type propagate_on_container_copy_assignment;
            typedef std::true_type  propagate_on_container_move_assignment;
            typedef std::true_type  propagate_on_container_swap;

            template <typename U> struct rebind { typedef MemoryPool<U> other; };

            MemoryPool() noexcept;
            MemoryPool(const MemoryPool&) noexcept;
            MemoryPool(MemoryPool&&) noexcept;
            template <class U> MemoryPool(const MemoryPool<U>&) noexcept;
            ~MemoryPool() noexcept;

            MemoryPool& operator=(const MemoryPool&) = delete;
            MemoryPool& operator=(MemoryPool&&) noexcept;

            pointer address(reference x) const noexcept { return &x; }
            const_pointer address(const_reference x) const noexcept { return &x; }

            pointer allocate(size_type n = 1, const_pointer hint = 0);
            void deallocate(pointer p, size_type n = 1);
            size_type max_size() const noexcept;
            template <class U, class... Args> void construct(U* p, Args&&... args) { new (p) U(std::forward<Args>(args)...); }
            template <class U> void destroy(U* p) { p->~U(); }

            template <class... Args> pointer newElement(Args&&... args);
            void deleteElement(pointer p);
        private:
            struct Element_ { Element_* pre; Element_* next; };
            typedef char* data_pointer;
            typedef Element_ element_type;
            typedef Element_* element_pointer;
            element_pointer data_element_;
            element_pointer free_element_;
            std::recursive_mutex m_;
            size_type padPointer(data_pointer p, size_type align) const noexcept;
            void allocateBlock();
            static_assert(BlockSize >= 2 * sizeof(element_type), "BlockSize too small.");
        };
        // ... (implementation omitted for brevity) ...
    }
}

#endif // PPX_BASE_MEMORY_POOL_H_

5.5 Usage Example

#include <iostream>
#include <thread>
using namespace std;
class Apple {
public:
    Apple() { cout << "Apple()" << endl; }
    Apple(int id) { cout << "Apple(" << id << ")" << endl; }
    ~Apple() { cout << "~Apple()" << endl; }
    void SetId(int id) { id_ = id; }
    int GetId() { return id_; }
private:
    int id_;
};

void ThreadProc(ppx::base::MemoryPool<char> *mp) {
    for (int i = 0; i < 100000; ++i) {
        char* p0 = (char*)mp->allocate();
        char* p1 = (char*)mp->allocate();
        mp->deallocate(p0);
        char* p2 = (char*)mp->allocate();
        mp->deallocate(p1);
        mp->deallocate(p2);
    }
}

int main() {
    ppx::base::MemoryPool<char> mp;
    // single‑thread test omitted for brevity
    std::thread th0(ThreadProc, &mp);
    std::thread th1(ThreadProc, &mp);
    std::thread th2(ThreadProc, &mp);
    th0.join(); th1.join(); th2.join();
    // object‑pool test with Apple
    Apple* apple = nullptr;
    {
        ppx::base::MemoryPool<Apple> mp2;
        apple = mp2.newElement(10);
        apple->SetId(12);
        mp2.deleteElement(apple);
    }
    // note: using the pointer after the pool is destroyed is undefined
    return 0;
}

6. Common Pitfalls and Best Practices

Incorrect manipulation of the free‑list can cause memory leaks.

Improper lock ordering may lead to deadlocks in multithreaded scenarios.

Over‑pre‑allocating memory wastes resources; size the pool based on profiling data.

Always clear or zero memory on deallocation only when required, as it adds CPU overhead.

7. Concurrency Design (Thread Cache, Central Cache, Page Cache)

The system uses a three‑layer architecture:

Thread Cache : per‑thread lock‑free freelists for allocations ≤ 64 KB.

Central Cache : shared pool protected by a mutex; supplies batches to Thread Caches.

Page Cache : manages memory in page‑size spans (4 KB) and grows the pool via mmap / brk / VirtualAlloc.

Mappings from objects to their originating Span are kept in an unordered_map<PageID, Span*> to enable correct return of memory to the appropriate span.

8. Limitations and Future Work

Current implementation still relies on new for internal data structures; a dedicated object‑pool for these structures would remove the dependency on the system allocator.

Portability improvements are needed for Linux (replace VirtualAlloc with brk / mmap) and 64‑bit systems (replace std::map with a radix tree for better performance).

Further benchmarking and lock‑free refinements could boost multithreaded throughput.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

C++Allocator

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.