How Redis Stream Replaced MQ in High‑Throughput Traffic Processing
This article explains why the traffic team switched from traditional MQ to Redis Stream, covering the underlying concepts, design choices, implementation details, load‑balancing strategies, cross‑datacenter handling, monitoring metrics, performance benchmarks, and practical lessons learned.
Background
The traffic‑condition team processes massive AMAP trajectory data in real time. Rapid growth in data volume and processing frequency caused the existing MQ middleware to become cost‑prohibitive, prompting a migration to Redis Stream as a lower‑cost, high‑throughput alternative.
Redis Stream Concept
Redis Stream, introduced in Redis 5.0, implements a FIFO log where each entry consists of an id (default UNIX‑timestamp_sequence) and a content payload. It supports multiple consumer groups, each tracking a last_delivered_id, and provides an ACK mechanism for confirming consumption. Streams can be trimmed lazily to avoid immediate deletions.
Design and Implementation
Load Balancing
Redis Stream does not have a native tag concept and may be deployed across sharded instances. To retain the original MQ semantics, the logical topic used by producers/consumers is treated as a prefix, while the actual Redis key is formed as topic_tag.
Topic Splitting
Producers send messages with a topic and a tag. The SDK concatenates them into a Redis key topic_tag. Consumers specify the same topic and tag; the SDK resolves the full key and reads from the corresponding stream, thereby emulating tag‑based routing.
Sharding Hash
When N total shards are available, the hash is calculated as:
globalIdx = tag % N
instanceIdx = globalIdx / shardsPerInstance
localIdx = globalIdx % shardsPerInstance
This mapping aligns with Redis Cluster’s 16384‑slot mechanism and guarantees an even distribution of keys across instances.
Cross‑Datacenter Read/Write
Two deployment models were compared:
Asynchronous hiredis client – producer operates asynchronously, consumer synchronously. Average end‑to‑end latency: 22‑23 ms.
Global‑active‑active Redis – both producer and consumer use synchronous connections across data centers. Average latency: 51‑57 ms and requires additional Redis instances.
Engineering Implementation (C++ SDK)
The SDK supports multiple Redis instances, configurable producer/consumer thread pools, synchronous or asynchronous modes, consumer‑group offset reset, load‑balancing, real‑time monitoring, and automatic reconnection.
Producer configuration
Instance list – one or more Redis endpoints.
Thread count – multiple threads round‑robin messages to balance load.
Consumer configuration
Instance list – same as producer.
Tag list – tags to subscribe.
Max tags per thread – limits the number of tags fetched in a single request to avoid backlog.
Processing thread count – number of threads that invoke the user‑defined callback.
Real‑time Monitoring
Redis itself does not expose queue lag metrics, so the following indicators are collected:
Production‑consumption volume – should match when there is no backlog.
Batch fetch size – maximum messages per pull; approaches the limit under backlog.
Write/read latency – spikes indicate possible network‑induced backlog.
Performance Test
Single‑threaded producer/consumer throughput drops as message size increases:
10 KB messages → >3000 TPS.
100 KB messages → ~1500 TPS.
Practical Experience
Online Performance
After full migration, cost and latency decreased by >90 %. A peak link processes ~20 million messages per minute (≈1 KB each) using four 64 GB, 64‑shard Redis instances.
Applicable Scenarios
Redis Stream is well‑suited for workloads with high message volume and strong cost constraints. It provides basic MQ features (consumer groups, offset reset, ACK) with high availability and throughput. Limitations include potential data loss on server failure, lack of a dedicated operations platform, and limited C++ client ecosystem.
Pitfalls & Tips
Use hiredis ≥ 1.2.0 for async connections; it supports connection‑timeout configuration.
Keep individual message size below 100 KB; larger payloads degrade single‑thread TPS.
Ensure a sufficient number of distinct tags to avoid shard skew.
CPU capacity, not memory or bandwidth, is the primary scaling factor for Redis instances.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
