Backend Development 24 min read

Design and Optimization of Massive Push Services Using Netty

This article analyzes common push‑service questions, presents a real‑world IoT case of memory leakage, and outlines key design points—including kernel limits, CLOSE_WAIT handling, heartbeat configuration, buffer management, memory pooling, logging pitfalls, TCP tuning, and JVM settings—to build a scalable, stable Netty‑based push server for millions of concurrent connections.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Design and Optimization of Massive Push Services Using Netty

1. Background

1.1. Source of the Topic

Many developers working on mobile Internet and IoT have asked me about push‑service issues, such as whether Netty can be used as a push server, how many clients a single server can support, and various technical problems encountered when developing push services with Netty.

Because the questions are numerous and focus on similar concerns, this article summarizes them, analyzes a real case, and extracts design guidelines to help practitioners avoid common pitfalls.

1.2. Push Service

In the mobile Internet era, push services are essential for app engagement and retention. Most notifications and advertisements on smartphones are delivered via push.

With the growth of IoT, smart‑home devices also rely on mobile push, and future IoT devices will become push‑service clients, leading to massive numbers of endpoints.

1.3. Characteristics of Push Services

Key characteristics include:

Unstable wireless networks (e.g., poor signal in subways) cause frequent disconnections.

Massive long‑lived connections consume significant resources on both client and server.

Android devices maintain multiple long connections, generating substantial heartbeat traffic, which increases data usage and power consumption.

Unreliable delivery: message loss, duplicate pushes, latency, and expiration are common.

Spam and lack of unified governance.

Some vendors, such as JD Cloud, provide solutions like a single‑service‑single‑connection model with AlarmManager‑based heartbeat reduction.

2. Real IoT Case Study

2.1. Problem Description

An MQTT middleware for a smart‑home platform kept 100,000 users online with long connections and 20,000 concurrent message requests. After running for a while, memory leakage was observed, suspected to be a Netty bug.

Server: 16 GB RAM, 8‑core CPU.

Netty boss thread pool size = 1, worker pool size = 6 (later changed to 11, issue persisted).

Netty version 4.0.8.Final.

2.2. Problem定位 (Root Cause Analysis)

Heap dump revealed a 9,076% increase in ScheduledFutureTask instances (≈1.1 million). The cause was the use of IdleStateHandler with a 15‑minute idle timeout, creating a scheduled task per connection.

Each task held business fields, preventing GC and causing apparent memory leakage. Reducing the idle timeout to 45 seconds allowed normal memory reclamation and solved the issue.

2.3. Problem Summary

Even a few hundred connections do not cause leakage; the problem appears only at the hundred‑thousand‑scale where small misconfigurations are amplified.

The following sections describe how to design a Netty‑based push service that can handle millions of clients.

3. Design Points for Massive Netty Push Services

Netty is a high‑performance NIO framework, but building a stable, high‑throughput push service requires careful architectural decisions.

3.1. Adjusting Maximum File Handles

Linux’s default limit of 1024 open files per process is insufficient for millions of connections. Use ulimit -a to view limits and edit /etc/security/limits.conf :

* soft nofile 1000000
* hard nofile 1000000

After logging out and back in, verify the new limits. Note that extremely high handle counts can degrade performance, so balance against hardware capacity.

3.2. Beware of CLOSE_WAIT

Unstable mobile networks cause frequent client reconnects, leading to many sockets stuck in CLOSE_WAIT if the server does not promptly close them, eventually exhausting file handles.

Properly handle reconnection intervals, reject duplicate logins, and ensure I/O and decode exceptions are processed to avoid handle leaks.

3.3. Reasonable Heartbeat Interval

Mobile networks often drop idle connections after a few minutes; a heartbeat interval around 180 seconds (e.g., WeChat uses 300 seconds) balances connection stability and signaling load.

In Netty, configure IdleStateHandler in the pipeline:

public void initChannel(Channel channel) {
    channel.pipeline().addLast("idleStateHandler", new IdleStateHandler(0, 0, 180));
    channel.pipeline().addLast("myHandler", new MyHandler());
}

public class MyHandler extends ChannelHandlerAdapter {
    @Override
    public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception {
        if (evt instanceof IdleStateEvent) {
            // handle heartbeat
        }
    }
}

3.4. Proper Buffer Sizes

Each connection maintains receive and send buffers. Using fixed‑size ByteBuffer can waste memory at scale. Netty’s ByteBuf supports dynamic resizing via two allocators:

FixedRecvByteBufAllocator – fixed length, can still expand when needed.

AdaptiveRecvByteBufAllocator – adjusts size based on recent traffic, reducing waste.

Example configuration:

Bootstrap b = new Bootstrap();
 b.group(group)
  .channel(NioSocketChannel.class)
  .option(ChannelOption.TCP_NODELAY, true)
  .option(ChannelOption.RCVBUF_ALLOCATOR, AdaptiveRecvByteBufAllocator.DEFAULT);

3.5. Memory Pooling

Creating and releasing ByteBuf for every message generates GC pressure. Netty’s pooled allocator (e.g., PooledByteBufAllocator ) reuses buffers, dramatically reducing allocation and GC overhead.

Enable pooling:

Bootstrap b = new Bootstrap();
 b.group(group)
  .channel(NioSocketChannel.class)
  .option(ChannelOption.TCP_NODELAY, true)
  .option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);

Remember to release buffers with ReferenceCountUtil.release(msg) to avoid leaks.

3.6. Logging Pitfalls

Synchronous logging (e.g., log4j without proper async configuration) can block I/O threads when the log queue is full, leading to socket closure delays and resource exhaustion.

3.7. TCP Parameter Tuning

Adjust SO_SNDBUF and SO_RCVBUF (e.g., 32 KB) to match typical message sizes. Enable Receive Packet Steering (RPS) on Linux ≥ 2.6.35 to distribute soft interrupts across CPUs, improving throughput.

3.8. JVM Settings

Set appropriate -Xmx based on workload and tune GC (young/old generation ratios, collector choice) to minimize Full GC pauses.

4. Author Biography

Li Linfeng graduated from Northeastern University in 2007, joined Huawei in 2008, and has six years of NIO experience, specializing in Netty and Mina. He is the founder of the Netty China community and author of "Netty Authority Guide".

Contact: Sina Weibo Nettying, WeChat Nettying.

JavaMemory Managementscalabilitybackend developmentNettyTCPpush service
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.