RocksDB Fundamentals and Its Application in Vivo Message Push System
The article explains RocksDB’s LSM‑based architecture, column‑family isolation, and snapshot features, and shows how Vivo’s VPUSH MappingTransformServer uses these capabilities with C++ code to store billions of registerId‑to‑ClientId mappings across multiple replicated servers for high‑concurrency, low‑latency, and fast service expansion.
This article introduces the basic principles of RocksDB and demonstrates how Vivo's message‑push system (VPUSH) leverages RocksDB for high‑concurrency mapping between registerId and ClientId . The goal is to share practical insights for readers who use RocksDB.
Background
In the VPUSH service, a client device is identified by a registerId . Internally, the service uses an internal identifier ClientId . A mapping service called MappingTransformServer (MT) stores the registerId ↔ ClientId mapping in RocksDB, which provides fast read/write and low storage cost.
RocksDB Overview
RocksDB is a fork of LevelDB that adds high‑concurrency write support, optimized SST file layout, and multiple compression strategies. It is widely used as the storage engine for distributed databases such as TiDB.
2.1 LSM Design
RocksDB is built on the Log‑Structured Merge‑Tree (LSM) design. LSM avoids random disk writes by first writing data to memory, then flushing to disk in sorted files (SSTs) that are organized into multiple levels (L0 … Ln). The write path is:
Write data to the in‑memory memtable and simultaneously record a write‑ahead log (WAL).
When the memtable reaches a size threshold, it becomes an immutable memtable .
A flush thread persists the immutable memtable as an SST file in level L0.
A compaction thread merges L0 files into higher levels (L1‑Ln).
2.2 Internal Structure
RocksDB stores data in Column Families (CF) , each acting as a namespace. A CF consists of three components:
memtable : in‑memory write buffer.
sstfile : persistent on‑disk file.
WAL : shared write‑ahead log for crash recovery.
Additional metadata files include Manifest (stores LSM tree information) and Meta for snapshots.
2.3 Write Flow
The write flow follows the LSM steps described above, ensuring high throughput and low latency.
2.4 Read Flow
Read operations start from the memtable, then check immutable memtables, and finally search SST files level by level using binary search.
2.5 Summary
RocksDB achieves high performance by writing first to memory, flushing to sorted SST files, and organizing files into multiple levels. Hot data stays in lower levels, while cold data moves to higher levels.
Business Scenario
The MT service stores billions of registerId → ClientId mappings. To achieve high availability, each application’s data is cached on multiple MT servers (e.g., MT1, MT2, MT3). This multi‑replica design reduces the risk of a single point of failure compared with a centralized Redis cache.
3.1 Column Family Usage
Each application is assigned its own column family, allowing independent management (e.g., copying, snapshotting). The default column family is used when no explicit CF is specified.
Example code for initializing RocksDB with column families:
#include "rocksdb/db.h"
#include "rocksdb/slice.h"
#include "rocksdb/options.h"
#include "rocksdb/utilities/checkpoint.h"
#include "rocksdb/metadata.h"
#include "rocksdb/cache.h"
#include "rocksdb/table.h"
#include "rocksdb/slice_transform.h"
#include "rocksdb/filter_policy.h"
#include
using namespace rocksdb;
int32_t RocksDBCache::init(){
DB *db;
std::string m_dbPath = "/rocksdb";
Options options;
options.IncreaseParallelism();
options.OptimizeLevelStyleCompaction();
options.create_missing_column_families = true;
std::vector
column_families_list;
DB::ListColumnFamilies(options, m_dbPath, &column_families_list);
if (column_families_list.empty()) {
column_families_list.push_back("default");
}
std::vector
column_families;
for (auto cfName : column_families_list) {
column_families.push_back(ColumnFamilyDescriptor(cfName, ColumnFamilyOptions()));
}
std::vector
handles;
Status s = DB::Open(options, m_dbPath, column_families, &handles, &db);
if (column_families_list.size() != handles.size()) {
return FAILURE;
}
for (unsigned int i = 0; i < column_families_list.size(); i++) {
handleMap[column_families_list[i]] = handles[i];
}
return SUCCESS;
}Creating a new column family:
int32_t RocksDBCache::createCF(const std::string &cfName) {
ColumnFamilyHandle *cf = nullptr;
if(handleMap.find(cfName) != handleMap.end()) {
return FAILURE; // already exists
}
Status s = db->CreateColumnFamily(ColumnFamilyOptions(), cfName, &cf);
if (!s.ok()) {
return FAILURE;
}
handleMap[cfName] = cf;
return SUCCESS;
}Read and write examples (simplified):
int32_t RocksDBCache::get(const std::string &cf, const std::string &key, std::string &value){
auto it = handleMap.find(cf);
if (it == handleMap.end()) return FAILURE;
Status s = db->Get(ReadOptions(), it->second, key, &value);
return s.ok() ? SUCCESS : (s.IsNotFound() ? FAILURE : FAILURE);
}
int32_t RocksDBCache::put(const std::string &cf, const std::string &key, const std::string &value){
auto it = handleMap.find(cf);
if (it == handleMap.end()) return FAILURE;
Status s = db->Put(WriteOptions(), it->second, key, value);
return s.ok() ? SUCCESS : FAILURE;
}Batch write example:
int32_t RocksDBCache::writeBatch(const std::string &cfName, const std::string &file){
if(handleMap.find(cfName) == handleMap.end()) return FAILURE;
WriteBatch batch;
ColumnFamilyHandle *handle = handleMap[cfName];
std::string line;
int count = 0;
while (std::getline(file, line)) {
// parse line → key/value
batch.Put(handle, key, value);
if (++count >= 1000) {
db->Write(WriteOptions(), &batch);
batch.Clear();
count = 0;
}
}
db->Write(WriteOptions(), &batch);
return SUCCESS;
}3.2 Snapshot Usage
To expand a new MT server, the team copies only the required column‑family data using RocksDB snapshots. The snapshot is generated via Checkpoint::ExportColumnFamily , serialized to a JSON meta file, transferred with rsync / scp , and imported on the target machine with CreateColumnFamilyWithImport .
Snapshot export example:
void RocksDBCache::createCfSnapshot(const std::string &cfName){
if(handleMap.find(cfName) == handleMap.end()) return FAILURE;
ColumnFamilyHandle* cfHandle = handleMap[cfName];
std::string exportDir = "/rocksdb_app_snapshot";
ExportImportFilesMetaData* meta = nullptr;
Checkpoint* checkpoint;
Checkpoint::Create(db, &checkpoint);
checkpoint->ExportColumnFamily(cfHandle, exportDir, &meta);
// serialize meta to JSON
std::string jsonMeta;
metaToJson(meta, jsonMeta);
std::ofstream ofs(exportDir + "/meta.json");
if (ofs.is_open()) {
ofs << jsonMeta << std::endl;
ofs.close();
}
}Importing the snapshot on a new MT server:
int32_t RocksDBCache::importSnapshot(const std::string &cfName, const std::string &path){
if(handleMap.find(cfName) != handleMap.end()) return FAILURE; // already exists
std::string metaPath = path + "/meta.json";
std::ifstream fin(metaPath, std::ios::binary);
if (!fin.is_open()) return FAILURE;
ExportImportFilesMetaData meta;
jsonToMeta(fin, meta);
fin.close();
ColumnFamilyHandle* cfHandle;
db->CreateColumnFamilyWithImport(ColumnFamilyOptions(), cfName,
ImportColumnFamilyOptions(), meta, &cfHandle);
handleMap[cfName] = cfHandle;
return SUCCESS;
}The overall expansion process consists of exporting snapshots from existing MT nodes, copying them to the new node, and loading them via the import API, achieving a fast (1‑2 hours) service rollout.
Conclusion
The article demonstrates how RocksDB’s LSM architecture, column families, and snapshot capabilities enable a scalable, high‑availability mapping service for massive registerId → ClientId datasets. It also provides concrete C++ code snippets for initialization, column‑family management, read/write operations, batch writes, and cross‑machine snapshot import.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.