Understanding Instant Messaging (IM) Architecture, Protocol Design, and Real‑Time Web Chat Implementation
This article explains the fundamentals of instant messaging, its system characteristics, protocol layers (application, security, transport), practical protocol examples, and a detailed real‑time web chatroom design using HTTP long‑polling and backend architecture considerations.
1. What is IM
IM stands for instant messaging, a communication method where messages are delivered in real time according to a defined protocol, exemplified by services such as ICQ, WhatsApp, Skype, and MSN.
1.1 IM Overview
"Instant" means a response perceived as immediate (seconds or milliseconds), while "communication" refers to the exchange of information based on an agreed protocol.
1.2 IM System Characteristics
Real‑time : latency is measured against user expectations rather than absolute time.
Push‑based delivery : messages are actively pushed to clients instead of being fetched via request/response.
Message reachability : reliable delivery is constrained by the SMC theorem – a protocol cannot guarantee both no loss and no duplication simultaneously.
State consistency : user presence (online/offline) and group membership must be consistently propagated to potentially thousands of peers.
2. Protocol Design
A network protocol consists of semantics (what to do), syntax (how to do it), and timing (order of operations). IM protocols are typically divided into three layers: application, security, and transport.
2.1 Application Layer
Common IM application‑layer protocols include:
Text Protocol
Human‑readable formats such as HTTP or the legacy MSN protocol. Advantages: good readability, easy debugging, extensible key/value pairs. Disadvantages: moderate parsing cost and poor binary support.
Binary Protocol
Fixed‑length headers with extensible variable‑length bodies (e.g., IP, QQ). Advantages: high parsing efficiency, compact representation. Disadvantages: poor readability and limited extensibility.
Streaming XML (XMPP)
Uses continuous XML streams; examples include Gtalk and many campus IM systems. Advantages: standardised, extensible, readable. Disadvantages: high parsing overhead and low effective data throughput due to verbose tags.
Example XML stanza
<message to='[email protected]' from='[email protected]' type='chat' xml:lang='en'>
Wherefore art thou, Romeo?
</message>Binary Header Example
A typical 16‑byte fixed header contains:
Version (4 bytes)
Magic number (4 bytes) for alignment and corruption detection
Command identifier
Length of the variable‑length body
Variable‑length Body
Can be XML, protobuf, mcpack, etc. Many companies (including 58.com) use protobuf for its rich language support, built‑in compression, and widespread industry adoption.
2.2 Security Layer
Message confidentiality is essential. Options include:
HTTPS (high cost, strong security)
Custom encryption: Fixed key shared between client and server Per‑user key derived from user‑specific attributes Dynamic session keys negotiated per connection (similar to SSL/TLS)
2.3 Transport Layer
Most IM systems use TCP for reliable delivery; with epoll/kqueue, a single server can handle hundreds of thousands of concurrent connections. Some legacy systems (e.g., early QQ) employed UDP with custom reliability mechanisms.
3. Web Chatroom Implementation
3.1 Requirements
Users must be able to set a name, see all participants, view chat history, and broadcast messages to the room.
3.2 System Architecture
A classic three‑tier LAMP‑style architecture is used, with a simple data layer consisting of two tables:
user( name varchar(16) unique );
message( time timestamp, name varchar(16), msg varchar(140) );Insertion and deletion from these tables drive user presence and message history.
3.3 Core Technique – HTTP Long‑Polling (Message Connection)
Instead of frequent short polls, the client opens a long‑lived HTTP request that the server holds until a new message arrives (or a timeout, typically 90 s). When a message is delivered, the server immediately returns it and the client instantly opens a new connection, forming an observer‑pattern where the chatroom is the subject and each user is an observer.
If a message arrives while no connection exists, it is placed in a per‑user message pool (a map keyed by UID) and delivered as soon as the next connection is established.
4. Typical IM Business Scenarios
Adding user B to group G initiated by user A involves multiple validation steps: user existence checks, group existence, friendship status, blacklist checks, rate‑limit enforcement, message content anti‑spam filtering, and policy verification. Each step may require database or remote service calls, illustrating the complexity of IM business logic.
5. Q&A Highlights
Selected questions cover XMPP cross‑domain reliability, header ordering rationale, purpose of the magic number, reliability mechanisms, encryption scope, reasons for custom protocols versus SIP/BOSH, long‑polling vs. WebSocket trade‑offs, handling client disconnections, scaling to 100 k users, and the nature of the message pool.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.