Backend Development 20 min read

Understanding Instant Messaging (IM) Architecture, Protocol Design, and Real‑Time Web Chat Implementation

This article explains the fundamentals of instant messaging, its system characteristics, protocol layers (application, security, transport), practical protocol examples, and a detailed real‑time web chatroom design using HTTP long‑polling and backend architecture considerations.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Understanding Instant Messaging (IM) Architecture, Protocol Design, and Real‑Time Web Chat Implementation

1. What is IM

IM stands for instant messaging, a communication method where messages are delivered in real time according to a defined protocol, exemplified by services such as ICQ, WhatsApp, Skype, and MSN.

1.1 IM Overview

"Instant" means a response perceived as immediate (seconds or milliseconds), while "communication" refers to the exchange of information based on an agreed protocol.

1.2 IM System Characteristics

Real‑time : latency is measured against user expectations rather than absolute time.

Push‑based delivery : messages are actively pushed to clients instead of being fetched via request/response.

Message reachability : reliable delivery is constrained by the SMC theorem – a protocol cannot guarantee both no loss and no duplication simultaneously.

State consistency : user presence (online/offline) and group membership must be consistently propagated to potentially thousands of peers.

2. Protocol Design

A network protocol consists of semantics (what to do), syntax (how to do it), and timing (order of operations). IM protocols are typically divided into three layers: application, security, and transport.

2.1 Application Layer

Common IM application‑layer protocols include:

Text Protocol

Human‑readable formats such as HTTP or the legacy MSN protocol. Advantages: good readability, easy debugging, extensible key/value pairs. Disadvantages: moderate parsing cost and poor binary support.

Binary Protocol

Fixed‑length headers with extensible variable‑length bodies (e.g., IP, QQ). Advantages: high parsing efficiency, compact representation. Disadvantages: poor readability and limited extensibility.

Streaming XML (XMPP)

Uses continuous XML streams; examples include Gtalk and many campus IM systems. Advantages: standardised, extensible, readable. Disadvantages: high parsing overhead and low effective data throughput due to verbose tags.

Example XML stanza

<message to='[email protected]' from='[email protected]' type='chat' xml:lang='en'>
Wherefore art thou, Romeo?
</message>

Binary Header Example

A typical 16‑byte fixed header contains:

Version (4 bytes)

Magic number (4 bytes) for alignment and corruption detection

Command identifier

Length of the variable‑length body

Variable‑length Body

Can be XML, protobuf, mcpack, etc. Many companies (including 58.com) use protobuf for its rich language support, built‑in compression, and widespread industry adoption.

2.2 Security Layer

Message confidentiality is essential. Options include:

HTTPS (high cost, strong security)

Custom encryption: Fixed key shared between client and server Per‑user key derived from user‑specific attributes Dynamic session keys negotiated per connection (similar to SSL/TLS)

2.3 Transport Layer

Most IM systems use TCP for reliable delivery; with epoll/kqueue, a single server can handle hundreds of thousands of concurrent connections. Some legacy systems (e.g., early QQ) employed UDP with custom reliability mechanisms.

3. Web Chatroom Implementation

3.1 Requirements

Users must be able to set a name, see all participants, view chat history, and broadcast messages to the room.

3.2 System Architecture

A classic three‑tier LAMP‑style architecture is used, with a simple data layer consisting of two tables:

user( name varchar(16) unique );
message( time timestamp, name varchar(16), msg varchar(140) );

Insertion and deletion from these tables drive user presence and message history.

3.3 Core Technique – HTTP Long‑Polling (Message Connection)

Instead of frequent short polls, the client opens a long‑lived HTTP request that the server holds until a new message arrives (or a timeout, typically 90 s). When a message is delivered, the server immediately returns it and the client instantly opens a new connection, forming an observer‑pattern where the chatroom is the subject and each user is an observer.

If a message arrives while no connection exists, it is placed in a per‑user message pool (a map keyed by UID) and delivered as soon as the next connection is established.

4. Typical IM Business Scenarios

Adding user B to group G initiated by user A involves multiple validation steps: user existence checks, group existence, friendship status, blacklist checks, rate‑limit enforcement, message content anti‑spam filtering, and policy verification. Each step may require database or remote service calls, illustrating the complexity of IM business logic.

5. Q&A Highlights

Selected questions cover XMPP cross‑domain reliability, header ordering rationale, purpose of the magic number, reliability mechanisms, encryption scope, reasons for custom protocols versus SIP/BOSH, long‑polling vs. WebSocket trade‑offs, handling client disconnections, scaling to 100 k users, and the nature of the message pool.

backend architectureReal-time Communicationprotocol designInstant Messagingweb chat
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.