Fundamentals 23 min read

High-Performance Object Pool Design and Implementation

The article presents a high‑performance, thread‑safe object pool that combines thread‑local freelists, a global pool, and cache‑line‑aligned structures to reuse expensive objects, dynamically expand capacity, and achieve 50‑70% lower allocation latency than standard malloc/free and existing pool implementations.

Tencent Cloud Developer

Mar 11, 2021

High-Performance Object Pool Design and Implementation

This article discusses the motivation, goals, design, implementation, and performance evaluation of a high‑performance object pool for server‑side applications. Object pools complement memory pools by reusing objects whose construction and destruction are expensive, thereby reducing allocation overhead in systems with massive object churn.

Background : Memory pools manage frequent allocations but lack reuse mechanisms for objects with high creation/destruction costs. An object pool provides object reuse, high performance, thread safety, dynamic capacity, and preferential allocation of previously used objects.

Goals :

Object reuse to avoid frequent malloc/free and constructor/destructor calls.

Very low allocation/deallocation overhead.

Thread‑safe access for concurrent threads.

Dynamic expansion of pool capacity.

Preferential allocation of already used objects.

Research on Existing Solutions :

1. brpc object pool – batch allocation and return to reduce global contention. Allocation flow: check thread‑local free block, try global pool, request a large block from the system.

2. Go object pool – each coroutine has a private and shared local pool; private pool is lock‑free, shared pool uses locks.

3. Netty recycler – each thread has a thread‑local Stack with WeakOrderQueue for cross‑thread recycling.

4. tcmalloc – per‑thread ThreadCache with central cache and page heap.

5. jemalloc – arena‑based allocation with per‑thread caches and weak order queues.

Overall Design : Combine freelist, thread‑local storage (TLS), and multiple resource pools. The structure consists of a Local Pool (per‑thread) and a Global Pool (shared). Objects are first taken from the local free list; if empty, the local pool fetches a free list from the global pool; if that fails, it allocates from its current block; if the block is exhausted, it requests a new block from the global pool.

Data Structures : union Slot { Slot *next_ = NULL; T val_; }; Slot represents a memory unit that can be either an allocated object or a node in the free list.

struct Block { Slot slots_[kBlockSize]; size_t idx_ = 0; };

Block is the basic allocation unit for a local pool.

struct BlockChunk { Block blocks_[kBlockChunkSize]; size_t idx_ = 0; };

BlockChunk groups multiple blocks for the global pool.

struct FreeSlots { Slot *head_ = NULL; size_t length_ = 0; };

FreeSlots stores a singly‑linked list of free slots.

struct BlockManager { std::vector<BlockChunk*> block_chunks_; };

Manages all block chunks.

struct FreeSlotsManager { size_t free_num_ = 0; std::vector<Slot*> freeslots_ptrs; };

Manages multiple free‑slot lists.

struct __attribute__((aligned(64))) LocalPool { GlobalPool<T>* global_pool_; Block<T>* block_; FreeSlots<T> freeslots_; };

Local pool is cache‑line aligned to avoid false sharing.

class __attribute__((aligned(64))) GlobalPool { BlockManager<T> block_manager_; FreeSlotsManager<T> freeslots_manager_; pthread_spinlock_t freeslots_lck_; pthread_mutex_t block_mtx_; };

Global pool uses a spin lock for free‑slot operations and a mutex for block management.

class ObjectPool { GlobalPool<T> global_pool_[kGlobalPoolNum]; thread_local static LocalPool<T>* local_pool_; std::atomic pool_idx_; };

ObjectPool provides the public interface and distributes local pools across multiple global pools using round‑robin.

Allocation Algorithm (simplified):

T* GetObject() { if (freeslots_.head_ != NULL) { Slot<T>* res = freeslots_.head_; freeslots_.head_ = res->next_; freeslots_.length_--; return (T*)res; } else if (global_pool_->PopFreeSlots(freeslots_)) { ... } else if (block_->idx_ < kBlockSize) { return (T*)&block_->slots_[block_->idx_++]; } else if (block_ = global_pool_->PopBlock()) { return (T*)&block_->slots_[block_->idx_++]; } else { return NULL; } }

Recycling Algorithm :

void ReturnObject(T* obj) { ((Slot<T>*)obj)->next_ = freeslots_.head_; freeslots_.head_ = (Slot<T>*)obj; freeslots_.length_++; if (freeslots_.length_ == kFreeSlotsSize) { global_pool_->PushFreeSlots(freeslots_); } }

Memory Alignment : Both LocalPool and GlobalPool are aligned to 64‑byte cache lines to eliminate false sharing, reducing latency by about 5%.

Lock Optimization : Use spin locks for the lightweight FreeSlotsManager and mutexes for the heavier BlockManager, achieving roughly 9% latency reduction.

Branch Prediction : Apply __builtin_expect to hint unlikely paths (e.g., allocation failure), yielding a modest 2% improvement.

Object Construction/Destruction on Reused Memory :

template<class... Args> void Construct(T* p, Args&&... args) { new (p) T(std::forward<Args>(args)...); } void Destroy(T* p) { p->~T(); }

Testing :

Validate effective allocation by reading/writing allocated objects.

Verify reuse by measuring process memory before and after repeated allocation‑release cycles.

Detect memory leaks with valgrind --tool=memcheck --leak-check=full.

Profile overhead using perf and compare against glibc malloc/free, jemalloc, and brpc object pool.

Results show the custom object pool reduces allocation latency by over 50% compared to glibc malloc/free in low‑thread scenarios, and up to 69% compared to brpc’s pool in high‑thread workloads. Memory consumption is comparable to brpc’s pool (≈120 MiB vs 132 MiB for 16 threads).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

C lock optimization object pool thread local storage

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.