Backend Development 19 min read

Design and Optimization of a High‑Performance IM Instant Messaging Platform

This article details the architectural decisions, network protocol choices, message framing strategies, and server‑level optimizations—including Netty adoption, TCP handling, token management, load balancing, NIC queue configuration, and CPU affinity—that enable a scalable, low‑latency instant messaging service supporting millions of concurrent connections.

HomeTech
HomeTech
HomeTech
Design and Optimization of a High‑Performance IM Instant Messaging Platform

1. Introduction The early C‑end product used third‑party SaaS for instant messaging, which limited extensibility and security, prompting the development of a self‑controlled IM platform.

2. Network Communication Framework and Protocol

We chose Netty for its rich protocol support, high performance, and active community. Netty advantages include simple onboarding, built‑in codecs, high throughput, flexible threading, and stability.

TCP, being a stream protocol, can cause packet fragmentation. Common solutions are fixed‑length messages, delimiter‑based framing, length‑prefixed headers, or custom protocols. A typical message format consists of a fixed‑length header (type and body length) followed by a body encoded in JSON, Protobuf, etc.

3. Architecture Design

The system is divided into three layers:

Middleware layer : handles token acquisition, caching, and renewal.

Core layer : connection management, heartbeat (ping every 50 s), reconnection strategy (exponential back‑off), API wrappers, logging, and local database for sessions and messages.

Protocol layer : encodes/decodes messages, sessions, and commands.

Message types include text, image, emoji, voice, and video. Media are uploaded separately and referenced by URI with optional base64 thumbnails.

Message diffusion uses two models:

Read diffusion : a single copy stored; readers fetch it, saving storage but increasing read complexity.

Write diffusion : each user gets a copy; simplifies reads but increases storage and write load. A hybrid approach applies write diffusion to active users and read diffusion to inactive ones. SDK architecture includes token caching, automatic reconnection, and heartbeat mechanisms to maintain persistent TCP connections. 4. Server Optimization To support millions of connections and high QPS, we increased file descriptor limits: * soft nproc 1500000 * hard nproc 1500000 * soft nofile 1500000 * hard nofile 1500000 and kernel parameters: fs.nr_open = 3000000 fs.file-max = 3000000 Nginx is used as a Layer‑7 load balancer with TLS; port multiplexing alleviates local port exhaustion. Network card tuning includes enlarging the ring buffer and configuring multiple queues: ethtool -G em1 rx 4096 ethtool -G em1 tx 4096 ethtool -L em1 combined 16 CPU affinity is set per IRQ to balance interrupt handling: echo 0 > /proc/irq/107/smp_affinity_list echo 1 > /proc/irq/108/smp_affinity_list Intel Flow Director is enabled to steer packets to specific queues based on destination ports: ethtool --features em1 ntuple on ethtool --config-ntuple em1 flow-type tcp4 dst-port 9500 action 0 loc 1 ethtool --config-ntuple em1 flow-type tcp4 dst-port 9501 action 1 loc 2 Additional optimizations include dedicated servers for latency‑sensitive users, backup domains, and robust token‑cache mechanisms. 5. Conclusion The IM platform has been in production for over two years, serving millions of daily active users across single‑chat, group‑chat, chatroom, and public‑account scenarios, demonstrating the effectiveness of the described design and optimization strategies.

backendnetworkIMNettyProtocolInstant Messagingserver optimization
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.