Databases 9 min read

Inside Pika’s Write2File Binlog: Formats, Classes, and Operations Explained

This article provides a detailed walkthrough of Pika’s write2file binlog mechanism, covering manifest and binlog file structures, record formats, core implementation classes, and step‑by‑step procedures for constructing, updating, producing, and consuming logs in a large‑capacity Redis‑compatible storage system.

360 Zhihui Cloud Developer

Jun 6, 2017

Inside Pika’s Write2File Binlog: Formats, Classes, and Operations Explained

Overview

Building on the previous introduction to Pika—a large‑capacity Redis‑compatible storage system—this article dives into the write2file data storage method used in Pika’s replication mechanism, explaining file formats, core classes, and essential operations.

What Is Pika?

Pika is a high‑capacity Redis storage developed by the 360 Web Platform DBA and infrastructure team. It is not meant to replace Redis but to complement it by providing persistent storage that addresses Redis’s limitations in large‑scale scenarios, such as slow recovery, costly master‑slave synchronization, single‑threaded bottlenecks, limited data capacity, and high memory costs.

Binlog Replication Principle

The binlog system consists of two main files: manifest and write2file . The manifest records meta‑information (current log file number and offset), while each write2file+num file stores all Redis write commands and their parameters.

File Formats

Manifest file format:

Log offset (8 bytes) | con_offset (8 bytes, unused) | element count (4 bytes, unused) | log file number (4 bytes).

Binlog file format:

Each binlog file has a fixed size of 100 MB and is divided into 64 KB blocks. A record may span multiple blocks but never crosses into another binlog file, so a binlog file can exceed 100 MB.

Record format: Header | Cmd

Header: Record Length (3 bytes) | Timestamp (4 bytes) | Record Type (1 byte).

Cmd: The Redis command (partial or full) depending on the remaining space in the current block.

Implementation Classes

Version : Metadata class that maps the manifest file via mmap.

Binlog : Log class that maps the write2file file via mmap.

PikaBinlogSenderThread : Consumer thread that reads log files sequentially and processes the logs.

Basic Operations

Constructing a Binlog

//file_size can be set in the config, default 100MB
Binlog::Binlog(const std::string& binlog_path, const int file_size)

1.1 Create the binlog directory. 1.2 Check if the manifest file exists; create it if absent. 1.3 Initialize the Version class based on the manifest. 1.4 Locate the log file indicated by filenum in the manifest, position to pro_offset, and initialize log pointers, log length, and block count.

Updating Production Status

//pro_num: log file number
//pro_offset: log file offset
//Used when a full sync updates the slave’s binlog info
Status Binlog::SetProducerStatus(uint32_t pro_num, uint64_t pro_offset)

2.1 Delete write2file0. 2.2 Delete write2file+pro_num. 2.3 Create a new write2file+pro_num, pad pro_offset spaces, set version->pro_num and version->pro_offset, and flush to the manifest. 2.4 Initialize current filesize and block_offset.

//filenum: current log number
//pro_offset: current log offset
Status Binlog::GetProducerStatus(uint32_t* filenum, uint64_t* pro_offset)

3.1 Read pro_num and pro_offset from Version and return them.

Producing Logs

//Put->Produce->EmitPhysicalRecord
Status Binlog::Put(const std::string &item)

4.1 Check if the current log file needs to be split; if so, create a new file and reset counters. 4.2 If the remaining block size is less than kHeaderSize (8 bytes), pad with '\x00'. 4.3 Loop to emit physical records until the entire item is written, handling three cases: • left < avail: whole item fits, use kFullType. • left > avail and first fragment: use kFirstType. • left > avail and not first fragment: use kMiddleType. 4.4 EmitPhysicalRecord builds the record header (3‑byte length, 4‑byte timestamp, 1‑byte type), writes data, and updates block_offset and pro_offset.

Consuming Logs

//scratch: buffer for the complete Redis command
//Consume->ReadPhysicalRecord reads one full Record; multiple Records form a command
Status PikaBinlogSenderThread::Consume(std::string &scratch)

5.1 Loop reading records until a kFullType or kLastType is encountered. 5.1.1 If the remaining space in the block is less than or equal to kHeaderSize, it is padding and should be skipped. 5.1.2 Read data, update last_record_offset_ and con_offset.

Conclusion

The article has fully dissected Pika’s write2file replication mechanism. The next piece will break down the overall replication workflow, continuing the deep dive into Pika’s architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Redis binlog Pika Write2File

Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.