Fundamentals 23 min read

High-Performance Object Pool Design and Implementation

The article presents a high‑performance, thread‑safe object pool that combines thread‑local freelists, a global pool, and cache‑line‑aligned structures to reuse expensive objects, dynamically expand capacity, and achieve 50‑70% lower allocation latency than standard malloc/free and existing pool implementations.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
High-Performance Object Pool Design and Implementation

This article discusses the motivation, goals, design, implementation, and performance evaluation of a high‑performance object pool for server‑side applications. Object pools complement memory pools by reusing objects whose construction and destruction are expensive, thereby reducing allocation overhead in systems with massive object churn.

Background : Memory pools manage frequent allocations but lack reuse mechanisms for objects with high creation/destruction costs. An object pool provides object reuse, high performance, thread safety, dynamic capacity, and preferential allocation of previously used objects.

Goals :

Object reuse to avoid frequent malloc/free and constructor/destructor calls.

Very low allocation/deallocation overhead.

Thread‑safe access for concurrent threads.

Dynamic expansion of pool capacity.

Preferential allocation of already used objects.

Research on Existing Solutions :

1. brpc object pool – batch allocation and return to reduce global contention. Allocation flow: check thread‑local free block, try global pool, request a large block from the system.

2. Go object pool – each coroutine has a private and shared local pool; private pool is lock‑free, shared pool uses locks.

3. Netty recycler – each thread has a thread‑local Stack with WeakOrderQueue for cross‑thread recycling.

4. tcmalloc – per‑thread ThreadCache with central cache and page heap.

5. jemalloc – arena‑based allocation with per‑thread caches and weak order queues.

Overall Design : Combine freelist , thread‑local storage (TLS), and multiple resource pools. The structure consists of a Local Pool (per‑thread) and a Global Pool (shared). Objects are first taken from the local free list; if empty, the local pool fetches a free list from the global pool; if that fails, it allocates from its current block; if the block is exhausted, it requests a new block from the global pool.

Data Structures :

union Slot { Slot *next_ = NULL; T val_; };

Slot represents a memory unit that can be either an allocated object or a node in the free list.

struct Block { Slot slots_[kBlockSize]; size_t idx_ = 0; };

Block is the basic allocation unit for a local pool.

struct BlockChunk { Block blocks_[kBlockChunkSize]; size_t idx_ = 0; };

BlockChunk groups multiple blocks for the global pool.

struct FreeSlots { Slot *head_ = NULL; size_t length_ = 0; };

FreeSlots stores a singly‑linked list of free slots.

struct BlockManager { std::vector
block_chunks_; };

Manages all block chunks.

struct FreeSlotsManager { size_t free_num_ = 0; std::vector
freeslots_ptrs; };

Manages multiple free‑slot lists.

struct __attribute__((aligned(64))) LocalPool { GlobalPool
* global_pool_; Block
* block_; FreeSlots
freeslots_; };

Local pool is cache‑line aligned to avoid false sharing.

class __attribute__((aligned(64))) GlobalPool { BlockManager
block_manager_; FreeSlotsManager
freeslots_manager_; pthread_spinlock_t freeslots_lck_; pthread_mutex_t block_mtx_; };

Global pool uses a spin lock for free‑slot operations and a mutex for block management.

class ObjectPool { GlobalPool
global_pool_[kGlobalPoolNum]; thread_local static LocalPool
* local_pool_; std::atomic pool_idx_; };

ObjectPool provides the public interface and distributes local pools across multiple global pools using round‑robin.

Allocation Algorithm (simplified):

T* GetObject() { if (freeslots_.head_ != NULL) { Slot
* res = freeslots_.head_; freeslots_.head_ = res->next_; freeslots_.length_--; return (T*)res; } else if (global_pool_->PopFreeSlots(freeslots_)) { ... } else if (block_->idx_ < kBlockSize) { return (T*)&block_->slots_[block_->idx_++]; } else if (block_ = global_pool_->PopBlock()) { return (T*)&block_->slots_[block_->idx_++]; } else { return NULL; } }

Recycling Algorithm :

void ReturnObject(T* obj) { ((Slot
*)obj)->next_ = freeslots_.head_; freeslots_.head_ = (Slot
*)obj; freeslots_.length_++; if (freeslots_.length_ == kFreeSlotsSize) { global_pool_->PushFreeSlots(freeslots_); } }

Memory Alignment : Both LocalPool and GlobalPool are aligned to 64‑byte cache lines to eliminate false sharing, reducing latency by about 5%.

Lock Optimization : Use spin locks for the lightweight FreeSlotsManager and mutexes for the heavier BlockManager , achieving roughly 9% latency reduction.

Branch Prediction : Apply __builtin_expect to hint unlikely paths (e.g., allocation failure), yielding a modest 2% improvement.

Object Construction/Destruction on Reused Memory :

template
void Construct(T* p, Args&&... args) { new (p) T(std::forward
(args)...); } void Destroy(T* p) { p->~T(); }

Testing :

Validate effective allocation by reading/writing allocated objects.

Verify reuse by measuring process memory before and after repeated allocation‑release cycles.

Detect memory leaks with valgrind --tool=memcheck --leak-check=full .

Profile overhead using perf and compare against glibc malloc/free, jemalloc, and brpc object pool.

Results show the custom object pool reduces allocation latency by over 50% compared to glibc malloc/free in low‑thread scenarios, and up to 69% compared to brpc’s pool in high‑thread workloads. Memory consumption is comparable to brpc’s pool (≈120 MiB vs 132 MiB for 16 threads).

performance optimizationMemory ManagementC++lock optimizationObject poolThread Local Storage
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.