Investigation and Resolution of Partial Queue Consumption after RocketMQ Topic Expansion
This article details a real‑world RocketMQ case where expanding a topic's queue count caused two consumer groups to miss messages on one broker, explains the root cause of missing subscription metadata after cluster scaling, and outlines the manual steps taken to restore full consumption.
The message team received a report that after expanding a RocketMQ topic, some queues were not being consumed, leading to message backlog and impacting online services.
To avoid exposing production data, the issue was reproduced in a virtual machine environment.
Cluster status : The cluster consists of two brokers (broker-a and broker-b) with the following configuration:
brokerClusterName = DefaultCluster
brokerName = broker-a
brokerId = 0
deleteWhen = 04
fileReservedTime = 48
brokerRole = ASYNC_MASTER
flushDiskType = ASYNC_FLUSH
brokerIP1=192.168.0.220
brokerIP2-192.168.0.220
namesrvAddr=192.168.0.221:9876;192.168.0.220:9876
storePathRootDir=/opt/application/rocketmq-all-4.5.2-bin-release/store
storePathCommitLog=/opt/application/rocketmq-all-4.5.2-bin-release/store/commitlog
autoCreateTopicEnable=false
autoCreateSubscriptionGroup=falseBecause automatic topic and subscription group creation are disabled, any new resources must be provisioned manually.
Online queue expansion : Using the internal operations platform, the team executed the RocketMQ updateTopic command to increase the queue count from 4 to 8 on all brokers in the DefaultCluster:
sh ./mqadmin upateTopic -n 192.168.0.220:9876 -c DefaultCluster -t topic_dw_test_by_order_01 -r 8 -w 8The command succeeded, and the console confirmed that each broker now hosts 8 queues for the topic.
Message sending after expansion : Subsequent traffic showed that all 16 queues (8 per broker) were actively receiving messages, confirming that the online expansion did not require a restart of producers or consumers.
Problem emergence : Two of the five consumer groups subscribed to the topic reported that a subset of queues were not being consumed, causing downstream systems to miss processing.
Analysis : Inspection of the consumer status revealed that only one consumer process on broker‑a was handling its 8 queues, while the corresponding queues on broker‑b had no active consumers. Further investigation showed that the problematic consumer groups lacked subscription entries on broker‑b because the cluster expansion did not copy the existing topic.json and subscriptionGroup.json files to the new broker.
Resolution : The operations engineer manually created the missing subscription groups on broker‑b via the RocketMQ console. After the subscription entries were added, the previously idle consumer processes began pulling messages, and the backlog cleared.
Root cause and lessons : The expansion added a new broker but failed to synchronize topic and subscription metadata, and with autoCreateSubscriptionGroup set to false, the new broker could not serve the queues. The fix is to ensure that topic.json and subscriptionGroup.json are replicated across all brokers during scaling, or enable automatic creation if appropriate.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.