Design and Optimization of Massive Push Services Using Netty
This article analyzes common push‑service questions, presents a real‑world IoT case of memory leakage, and outlines key design points—including kernel limits, CLOSE_WAIT handling, heartbeat configuration, buffer management, memory pooling, logging pitfalls, TCP tuning, and JVM settings—to build a scalable, stable Netty‑based push server for millions of concurrent connections.
1. Background
1.1. Source of the Topic
Many developers working on mobile Internet and IoT have asked me about push‑service issues, such as whether Netty can be used as a push server, how many clients a single server can support, and various technical problems encountered when developing push services with Netty.
Because the questions are numerous and focus on similar concerns, this article summarizes them, analyzes a real case, and extracts design guidelines to help practitioners avoid common pitfalls.
1.2. Push Service
In the mobile Internet era, push services are essential for app engagement and retention. Most notifications and advertisements on smartphones are delivered via push.
With the growth of IoT, smart‑home devices also rely on mobile push, and future IoT devices will become push‑service clients, leading to massive numbers of endpoints.
1.3. Characteristics of Push Services
Key characteristics include:
Unstable wireless networks (e.g., poor signal in subways) cause frequent disconnections.
Massive long‑lived connections consume significant resources on both client and server.
Android devices maintain multiple long connections, generating substantial heartbeat traffic, which increases data usage and power consumption.
Unreliable delivery: message loss, duplicate pushes, latency, and expiration are common.
Spam and lack of unified governance.
Some vendors, such as JD Cloud, provide solutions like a single‑service‑single‑connection model with AlarmManager‑based heartbeat reduction.
2. Real IoT Case Study
2.1. Problem Description
An MQTT middleware for a smart‑home platform kept 100,000 users online with long connections and 20,000 concurrent message requests. After running for a while, memory leakage was observed, suspected to be a Netty bug.
Server: 16 GB RAM, 8‑core CPU.
Netty boss thread pool size = 1, worker pool size = 6 (later changed to 11, issue persisted).
Netty version 4.0.8.Final.
2.2. Problem定位 (Root Cause Analysis)
Heap dump revealed a 9,076% increase in ScheduledFutureTask instances (≈1.1 million). The cause was the use of IdleStateHandler with a 15‑minute idle timeout, creating a scheduled task per connection.
Each task held business fields, preventing GC and causing apparent memory leakage. Reducing the idle timeout to 45 seconds allowed normal memory reclamation and solved the issue.
2.3. Problem Summary
Even a few hundred connections do not cause leakage; the problem appears only at the hundred‑thousand‑scale where small misconfigurations are amplified.
The following sections describe how to design a Netty‑based push service that can handle millions of clients.
3. Design Points for Massive Netty Push Services
Netty is a high‑performance NIO framework, but building a stable, high‑throughput push service requires careful architectural decisions.
3.1. Adjusting Maximum File Handles
Linux’s default limit of 1024 open files per process is insufficient for millions of connections. Use ulimit -a to view limits and edit /etc/security/limits.conf :
* soft nofile 1000000
* hard nofile 1000000After logging out and back in, verify the new limits. Note that extremely high handle counts can degrade performance, so balance against hardware capacity.
3.2. Beware of CLOSE_WAIT
Unstable mobile networks cause frequent client reconnects, leading to many sockets stuck in CLOSE_WAIT if the server does not promptly close them, eventually exhausting file handles.
Properly handle reconnection intervals, reject duplicate logins, and ensure I/O and decode exceptions are processed to avoid handle leaks.
3.3. Reasonable Heartbeat Interval
Mobile networks often drop idle connections after a few minutes; a heartbeat interval around 180 seconds (e.g., WeChat uses 300 seconds) balances connection stability and signaling load.
In Netty, configure IdleStateHandler in the pipeline:
public void initChannel(Channel channel) {
channel.pipeline().addLast("idleStateHandler", new IdleStateHandler(0, 0, 180));
channel.pipeline().addLast("myHandler", new MyHandler());
}
public class MyHandler extends ChannelHandlerAdapter {
@Override
public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception {
if (evt instanceof IdleStateEvent) {
// handle heartbeat
}
}
}3.4. Proper Buffer Sizes
Each connection maintains receive and send buffers. Using fixed‑size ByteBuffer can waste memory at scale. Netty’s ByteBuf supports dynamic resizing via two allocators:
FixedRecvByteBufAllocator – fixed length, can still expand when needed.
AdaptiveRecvByteBufAllocator – adjusts size based on recent traffic, reducing waste.
Example configuration:
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
.option(ChannelOption.RCVBUF_ALLOCATOR, AdaptiveRecvByteBufAllocator.DEFAULT);3.5. Memory Pooling
Creating and releasing ByteBuf for every message generates GC pressure. Netty’s pooled allocator (e.g., PooledByteBufAllocator ) reuses buffers, dramatically reducing allocation and GC overhead.
Enable pooling:
Bootstrap b = new Bootstrap();
b.group(group)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
.option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);Remember to release buffers with ReferenceCountUtil.release(msg) to avoid leaks.
3.6. Logging Pitfalls
Synchronous logging (e.g., log4j without proper async configuration) can block I/O threads when the log queue is full, leading to socket closure delays and resource exhaustion.
3.7. TCP Parameter Tuning
Adjust SO_SNDBUF and SO_RCVBUF (e.g., 32 KB) to match typical message sizes. Enable Receive Packet Steering (RPS) on Linux ≥ 2.6.35 to distribute soft interrupts across CPUs, improving throughput.
3.8. JVM Settings
Set appropriate -Xmx based on workload and tune GC (young/old generation ratios, collector choice) to minimize Full GC pauses.
4. Author Biography
Li Linfeng graduated from Northeastern University in 2007, joined Huawei in 2008, and has six years of NIO experience, specializing in Netty and Mina. He is the founder of the Netty China community and author of "Netty Authority Guide".
Contact: Sina Weibo Nettying, WeChat Nettying.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.