Design and Implementation of a Scalable Long‑Connection Gateway
This article details the architecture, protocol design, permission control, reliability mechanisms, and scaling strategies of a long‑connection gateway built with OpenResty, Kafka, and Redis, illustrating how to share persistent connections across multiple business services while ensuring high performance and fault tolerance.
Real‑time interactions such as instant messaging, live‑stream comments, and online gaming rely on long‑connection technology, and many Internet companies operate dedicated long‑connection systems for notifications, push, location sharing, and more. When multiple services need to share a single long‑connection platform, challenges arise in authentication, data isolation, protocol extensibility, and capacity management.
After a year of development and iteration, the team distilled a generic long‑connection gateway solution that decouples business data, efficiently distributes messages, and provides a degree of reliability. The gateway adopts a publish‑subscribe model where clients and backend services communicate via topics, allowing flexible permission checks through ACL rules and HTTP callbacks.
Permission control is achieved by configuring ACL templates that embed user identifiers in topic names, enabling the gateway to independently verify subscription rights without contacting business services. This reduces coupling and simplifies integration for private user topics.
To guarantee message reliability, the gateway implements acknowledgments and retransmission. Important messages are marked with QoS 1, stored in Redis until the client acknowledges receipt, and resent if necessary. For high‑throughput scenarios, the system leverages Kafka as a message hub, supporting various routing patterns such as publish‑only, publish‑and‑consume, consume‑only, and filtered pipelines.
The overall architecture consists of four core components: an OpenResty access layer for load balancing and session stickiness, a containerized long‑connection broker handling protocol parsing, authentication, and pub‑sub logic, Redis for persisting session state, and Kafka for decoupled message distribution. This modular design enhances reliability, horizontal scalability, and component maturity.
Load balancing is performed at the seventh layer using Nginx’s preread mechanism to hash client identifiers, ensuring consistent broker assignment even after network changes. The broker’s subscription map originally used a single locked HashMap, which became a bottleneck; it was refactored into sharded maps to reduce lock contention and improve performance.
Session persistence stores unacknowledged messages in Redis, allowing clients to reconnect to a different broker without losing state. A sliding‑window mechanism, inspired by TCP, permits multiple in‑flight messages, improving throughput while preserving order; only when a client reconnects are unacknowledged messages retransmitted.
In summary, the gateway combines proven technologies (OpenResty, Kafka, Redis) with custom protocol extensions and robust concurrency controls to deliver a reliable, scalable long‑connection service suitable for diverse real‑time business scenarios.
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.