Backend Development 28 min read

From ActiveMQ to RocketMQ: My Journey Through Message Queues and Lessons Learned

This article chronicles the author's four‑stage evolution with message queues—from early experiments with ActiveMQ, through Redis and RabbitMQ, to MetaQ and finally RocketMQ—highlighting practical challenges, architectural decisions, performance tuning, and insights for building robust, high‑throughput backend systems.

macrozheng
macrozheng
macrozheng
From ActiveMQ to RocketMQ: My Journey Through Message Queues and Lessons Learned

1. First Encounter with ActiveMQ

1.1 Asynchrony & Decoupling

In early 2011 I worked on a user‑center system at an online lottery company. After a user registered, we needed to send an SMS, but the registration and SMS modules were tightly coupled, causing high latency and instability.

Problems included:

SMS channel latency of about 5 seconds, degrading user experience.

Changes to the SMS channel required modifications in the core user‑center code.

The first issue could be solved with a thread‑pool for asynchrony, but the second required a more robust solution.

Introducing a message queue allowed us to decouple SMS sending by creating an independent job service that consumes messages and invokes the SMS provider.

The core functions of a message queue are asynchrony and decoupling .

1.2 Scheduling Center

The lottery order lifecycle involves many stages (creation, sub‑order splitting, ticket issuing, prize calculation, etc.). A scheduling center maintains the order state machine and uses a message queue to exchange information with ticket‑issuing and prize‑calculation services.

When daily transaction volume reached tens of millions, the scheduling center was maintained by only two engineers, yet its code quality, logging, and conventions remained excellent.

1.3 Restart Strategy

During a peak betting deadline, the scheduling center could not consume messages; the message bus could only produce, not consume, causing severe anxiety.

Deploying additional instances did not help; the service hung after processing a few thousand messages. The team resorted to repeatedly restarting the service—over twenty times—to finally clear the backlog.

Investigation with

jstack

revealed that large Oracle transactions (over 30 minutes) blocked the scheduling threads.

Mitigation steps:

Split oversized messages into smaller batches before sending.

Configure the data source to abort transactions that exceed a certain duration.

1.4 Retrospective

Spring’s ActiveMQ API is concise and pleasant to use, but high‑throughput scenarios can cause message backlog and occasional hangs.

High‑Throughput Backlog

When the ticket‑issuing gateway generated a large volume of messages, some messages were persisted to local disk as XML files and consumed asynchronously, greatly improving consumption speed while introducing a risk of data loss if the disk failed.

High‑Availability Issues

The master/slave deployment allowed only one slave to connect to the master, requiring manual intervention when the master failed. Occasionally, messages appeared on the slave console but not on the master, and consumers could not read from the slave without manual steps.

2. Advancing with Redis & RabbitMQ

2.1 Can Redis Be a Message Queue?

At eLong, the coupon calculation service used Storm. Redis lists served as a push/pop queue for streaming data.

Data flow:

Hotel information service sends data to Redis clusters A/B.

Storm spout reads from Redis A/B and emits tuples to bolts.

Bolt cleans data according to business rules.

Processed data is written back to Redis cluster C.

Ingestion service reads from Redis C and stores data in a database.

Search team scans the database to build indexes.

While Redis worked, concerns included occasional message loss during topology upgrades, memory pressure from queue buildup, and the desire to replace Redis C with Kafka for better decoupling.

2.2 RabbitMQ Is a Pipe, Not a Pool

RabbitMQ, written in Erlang, provided high‑availability via mirrored queues and handled millions of messages per day in the coupon system.

In a stress test sending 10 million messages, the queue accumulated over 5 million messages, causing producer latency to rise from 2 ms to about 500 ms and triggering alarms.

RabbitMQ excels as a pipe but performs poorly when messages pile up, leading to sharp performance degradation.

3. Elevating with MetaQ

MetaQ’s design, based on a pull mechanism and heavy use of Zookeeper for service discovery and offset storage, impressed me greatly.

3.1 Impressive Consumer Model

MetaQ supports two consumption models: cluster consumption and broadcast consumption.

Orders are sent to MetaQ; both the dispatch service and BI service can consume them.

Broadcast consumption is used for driver push notifications.

3.2 Aggressive Peak Smoothing

In 2015, Shenzhou ridesharing faced massive order growth. We introduced an order cache that writes to MetaQ; the order persistence service consumes messages, validates order sequence, and stores data in the database.

Key details:

Consumers consume messages in order by routing them to the same partition based on order number.

A watchdog task periodically reconciles cache and database inconsistencies and raises alerts.

3.3 Message SDK Wrapping

We wrapped the MetaQ client in an SDK to provide a simple API (topic and group only), hide environment details, and enable hot configuration changes without client restarts.

3.4 Refactoring MetaQ

MetaQ suffered from occasional RPC hangs and weak operational tooling. The underlying network framework was switched from Gecko to Netty, and a dedicated Zookeeper cluster was introduced to reduce load on the shared ZK ensemble.

4. Falling in Love with RocketMQ

4.1 Open‑Source Feast

After RocketMQ open‑sourced, I explored its Netty‑based remoting module, built a toy RPC, and later used RocketMQ for an SMS service.

SMS service architecture:

Design a simple SDK‑style API.

API endpoint receives SMS requests and pushes messages to RocketMQ.

Worker services consume messages, load‑balance across channel providers, and invoke the actual SMS gateways.

A dashboard displays sending records and channel configurations.

4.2 Kafka: Essential for Big Data

Kafka provides high‑throughput, persistent, horizontally scalable streaming, widely used for log collection, stream processing, and real‑time monitoring.

Key components of log synchronization:

Log‑collecting client batches and asynchronously sends logs to Kafka.

Kafka persists logs in message files.

Log processing applications (e.g., Logstash) consume Kafka messages for indexing or downstream big‑data pipelines.

Kafka also acts as a data hub, feeding the same data into multiple specialized systems such as HBase, Elasticsearch, or time‑series databases.

4.3 How to Choose a Message Queue

Selection should start from the scenario, then consider technical reserve, cost, and people factors.

Databases are specializing – the “one size fits all” approach no longer applies ----- MongoDB design philosophy

Technical reserve includes team experience and existing SDKs; cost covers development, testing, operations, hardware, and hiring.

5. Final Thoughts

I believe that without massive accumulation and reflection, nothing great can be achieved. Continuous learning, even a little each day, is the key to progress.

The journey through various message‑queue technologies has taught me the importance of curiosity, simplicity, and relentless improvement.

distributed systemsbackend architectureKafkaMessage QueueRabbitMQActiveMQ
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.