Backend Development 11 min read

Understanding RocketMQ Storage Architecture: CommitLog, ConsumeQueue, and Index Files

This article explains the core storage design of RocketMQ, covering the CommitLog, ConsumeQueue, and Index files, their organization, sequential write strategy, memory‑mapped I/O, and flexible flushing policies that together provide high‑throughput, low‑latency messaging for backend systems.

Full-Stack Internet Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Understanding RocketMQ Storage Architecture: CommitLog, ConsumeQueue, and Index Files

1. Storage Overview

RocketMQ stores messages on disk using three main file types: CommitLog, ConsumeQueue, and Index files. All topics share a single CommitLog to ensure sequential writes and high availability, while ConsumeQueue provides efficient topic‑based retrieval, and Index files implement hash‑based message attribute lookup.

2. Storage File Organization

Messages are appended sequentially to the CommitLog, which is immutable after write. Each message is identified by a physical offset, enabling binary search to locate messages quickly. ConsumeQueue acts as a topic‑based index with fixed‑size entries (offset, length, tag hash), allowing direct random access without scanning the entire CommitLog.

The Index file stores a 40‑byte header, 5 million hash slots, and 20 million index entries, each mapping a key’s hashcode to the message’s physical offset, timestamp, and a pointer to the previous entry for collision handling.

3. Sequential Write

RocketMQ emphasizes disk‑sequential writes to maximize I/O performance, similar to the redo log mechanism in MySQL InnoDB, which buffers changes in memory before flushing them sequentially to disk.

4. Memory‑Mapped I/O

To avoid the overhead of traditional Java file APIs, RocketMQ uses Java NIO's FileChannel.map to create memory‑mapped files, leveraging the OS page cache for faster reads and writes.

5. Flexible Flushing Strategies

RocketMQ supports both synchronous and asynchronous flushing. Synchronous flushing groups multiple messages and flushes them together, providing strong durability at the cost of latency. Asynchronous flushing writes to the page cache and returns success immediately, flushing to disk periodically (default 500 ms).

6. Memory‑Level Read‑Write Separation

By enabling transientStorePoolEnable , RocketMQ writes messages to off‑heap memory first, then asynchronously moves them to the page cache and finally to disk, reducing page‑cache pressure and improving throughput while still offering a fallback to page‑cache reads.

Overall, these design choices give RocketMQ high throughput, low latency, and configurable durability, making it a robust backend messaging solution.

backendMessage Queuerocketmqstorage designCommitLogConsumeQueue
Full-Stack Internet Architecture
Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.