FastThreadLocal in Netty: Background, Design Principles, and Source Code Analysis
This article explains why Netty implements FastThreadLocal instead of using JDK ThreadLocal, describes its array‑based design, internal classes such as InternalThreadLocalMap and FastThreadLocalThread, walks through the get() and initialization logic, discusses cleanup mechanisms, performance degradation on ordinary threads, and shows its practical use for ByteBuf allocation in Netty.
FastThreadLocal Introduction
Although JDK already provides ThreadLocal, Netty introduces FastThreadLocal (ftl) to avoid the hash‑collision overhead of ThreadLocalMap by using a simple indexed array.
Each Java thread has a ThreadLocalMap that is created lazily; the map resolves hash collisions via linear probing, which can be inefficient under heavy contention.
FastThreadLocal avoids this by assigning each ftl instance a unique index (generated by an AtomicInteger) and storing values directly in an Object[] array, eliminating hash lookups.
When ftl.get() is called, the value is retrieved from the array via the stored index:
return array[index];Source Code Overview
The implementation involves three main classes: InternalThreadLocalMap , FastThreadLocalThread , and FastThreadLocal . The analysis starts with InternalThreadLocalMap .
2.1 UnpaddedInternalThreadLocalMap Fields
static final ThreadLocal
slowThreadLocalMap = new ThreadLocal<>();
static final AtomicInteger nextIndex = new AtomicInteger();
Object[] indexedVariables;The indexedVariables array stores ftl values; nextIndex provides a unique slot for each ftl instance.
2.2 InternalThreadLocalMap Details
public static final Object UNSET = new Object();
private BitSet cleanerFlags;
private InternalThreadLocalMap() { super(newIndexedVariableTable()); }
private static Object[] newIndexedVariableTable() {
Object[] array = new Object[32];
Arrays.fill(array, UNSET);
return array;
}
public Object indexedVariable(int index) {
Object[] lookup = indexedVariables;
return index < lookup.length ? lookup[index] : UNSET;
}Values are stored directly in the array, not as map entries, which differentiates ftl from JDK ThreadLocal.
2.3 FastThreadLocalThread
public class FastThreadLocalThread extends Thread {
private final boolean cleanupFastThreadLocals;
private InternalThreadLocalMap threadLocalMap;
public final InternalThreadLocalMap threadLocalMap() { return threadLocalMap; }
public final void setThreadLocalMap(InternalThreadLocalMap map) { this.threadLocalMap = map; }
}FastThreadLocalThread holds its own InternalThreadLocalMap , enabling fast access without extra lookups.
2.4 FastThreadLocal Implementation
private final int index;
public FastThreadLocal() { index = InternalThreadLocalMap.nextVariableIndex(); }
public final V get() {
InternalThreadLocalMap map = InternalThreadLocalMap.get();
Object v = map.indexedVariable(index);
if (v != InternalThreadLocalMap.UNSET) return (V) v;
V value = initialize(map);
registerCleaner(map);
return value;
}
private V initialize(InternalThreadLocalMap map) {
V v = null;
try { v = initialValue(); } catch (Exception e) { PlatformDependent.throwException(e); }
map.setIndexedVariable(index, v);
addToVariablesToRemove(map, this);
return v;
}
private void registerCleaner(InternalThreadLocalMap map) { /* simplified in Netty 4.1.34 */ }The get() method first tries to read the cached value; if absent, it calls initialValue() , stores the result, and registers a cleaner.
2.5 Degradation on Ordinary Threads
When a thread is not a FastThreadLocalThread , InternalThreadLocalMap.get() falls back to a slow path using a regular JDK ThreadLocal ( slowThreadLocalMap ), which re‑introduces the hash‑collision overhead.
private static InternalThreadLocalMap slowGet() {
ThreadLocal
slowThreadLocalMap = UnpaddedInternalThreadLocalMap.slowThreadLocalMap;
InternalThreadLocalMap ret = slowThreadLocalMap.get();
if (ret == null) { ret = new InternalThreadLocalMap(); slowThreadLocalMap.set(ret); }
return ret;
}Resource Reclamation
Netty provides three cleanup strategies for ftl:
Automatic : wrapping a task with FastThreadLocalRunnable clears ftl after execution.
Manual : users call remove() on ftl or its map when appropriate.
Cleaner‑based : registers a Cleaner to release ftl when the thread is garbage‑collected (commented out in Netty 4.1.34).
FastThreadLocal Usage in Netty
The most important use case is allocating ByteBuf objects. Each thread holds a PoolArena via a FastThreadLocal cache; when a thread needs a buffer, it first tries its own arena, falling back to a global pool only if necessary.
final class PoolThreadLocalCache extends FastThreadLocal
{
@Override
protected synchronized PoolThreadCache initialValue() {
final PoolArena
heapArena = leastUsedArena(heapArenas);
final PoolArena
directArena = leastUsedArena(directArenas);
Thread current = Thread.currentThread();
if (useCacheForAllThreads || current instanceof FastThreadLocalThread) {
return new PoolThreadCache(heapArena, directArena, tinyCacheSize, smallCacheSize,
normalCacheSize, DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);
}
return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0, 0);
}
}By keeping per‑thread caches, Netty reduces contention and improves allocation performance.
References
Netty source analysis 3 – FastThreadLocal design
Netty advanced: top‑down parsing of FastThreadLocal
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.