Fundamentals of RocksDB and Its Application in Vivo Message Push System
The article explains RocksDB’s LSM‑based architecture, column‑family isolation, and snapshot features, and shows how Vivo’s VPUSH mapping service uses these capabilities to store billions of registerId‑to‑ClientId mappings with high‑concurrency, low‑cost, fault‑tolerant performance across multiple replicated servers.
This article, authored by the Vivo Internet Server Team (Zeng Luobin), introduces the basic principles of RocksDB and demonstrates its practical use in Vivo's message push system (VPUSH). It aims to inspire readers who use RocksDB by sharing explorations of its native capabilities.
Background : In the VPUSH architecture, client devices are identified by registerId (provided by the business side) and an internal ClientId . A mapping service called MappingTransformServer (MT) caches these identifiers using RocksDB as the storage engine, providing high‑concurrency read/write performance while keeping storage costs low.
RocksDB Overview : RocksDB originated from LevelDB; Facebook (Meta) enhanced it to support high‑concurrency writes, optimized SST file layout, and added multiple compression strategies. It inherits all LevelDB features and adds memory and disk optimizations, making it suitable for distributed, high‑reliability storage scenarios. Many databases (e.g., TiDB) use RocksDB as the underlying engine.
2.1 LSM Design Philosophy : RocksDB is built on the Log‑Structured Merge‑Tree (LSM) concept, which avoids random disk writes by first writing data to memory, then flushing to disk in ordered layers (L0‑Ln). This design improves write throughput and enables efficient binary search on disk.
2.2 Internal Structure : RocksDB consists of Column Families (namespaces), each containing a memtable, immutable memtable, SST files, and a shared Write‑Ahead Log (WAL). The memtable holds recent writes in memory; when full, it becomes immutable and is later flushed to an SST file (L0). SST files are immutable, ordered, and stored in multiple levels.
2.3 Write Path :
Data is written to the memtable and simultaneously logged to the WAL.
When the memtable reaches a size threshold, it is converted to an immutable memtable.
A flush thread persists the immutable memtable to an SST file (L0).
A compaction thread merges SST files from lower to higher levels.
2.4 Read Path : Reads start from the memtable, then check immutable memtables, and finally traverse SST files level by level using binary search.
2.5 Summary : RocksDB achieves high write performance by first storing data in memory, then persisting ordered keys to disk, and organizing files into hierarchical levels to keep hot data in lower levels.
Business Scenario : The system stores billions of registerId → ClientId mappings in RocksDB. To ensure high concurrency and availability, each application’s cache is replicated across multiple MT servers (e.g., MT1‑MT3 for app1, MT2‑MT4 for app2). This multi‑replica design reduces cost compared to Redis, improves fault tolerance, and offers flexible custom logic.
3.1 Using Column Families : Each application’s data is placed in its own column family, simplifying management and enabling operations such as copying or snapshotting per app. The default column family is used when no explicit family is specified.
Code example – initializing RocksDB with column families:
#include "rocksdb/db.h"
#include "rocksdb/slice.h"
#include "rocksdb/options.h"
#include "rocksdb/utilities/checkpoint.h"
#include "rocksdb/metadata.h"
#include "rocksdb/cache.h"
#include "rocksdb/table.h"
#include "rocksdb/slice_transform.h"
#include "rocksdb/filter_policy.h"
#include
using namespace rocksdb;
int32_t RocksDBCache::init(){
DB *db;
std::string m_dbPath = "/rocksdb";
Options options;
options.IncreaseParallelism();
options.OptimizeLevelStyleCompaction();
options.create_missing_column_families = true;
std::vector
column_families_list;
DB::ListColumnFamilies(options, m_dbPath, &column_families_list);
if (column_families_list.empty()) {
column_families_list.push_back("default");
}
std::vector
column_families;
for (auto cfName : column_families_list) {
column_families.push_back(ColumnFamilyDescriptor(cfName, ColumnFamilyOptions()));
}
std::vector
handles;
s = DB::Open(options, m_dbPath, column_families, &handles, &db);
if (column_families_list.size() != handles.size()) {
return FAILURE;
}
for (unsigned int i = 0; i < column_families_list.size(); i++) {
handleMap[column_families_list[i]] = handles[i];
}
return SUCCESS;
}Creating a column family:
int32_t RocksDBCache::createCF(const std::string &cfName) {
ColumnFamilyHandle *cf = nullptr;
Status s;
if(handleMap.find(cfName) != handleMap.end()) {
return FAILURE; // already exists
}
s = db->CreateColumnFamily(ColumnFamilyOptions(), cfName, &cf);
if (!s.ok()) {
return FAILURE;
}
handleMap[cfName] = cf;
return SUCCESS;
}Read and write operations (simplified):
int32_t RocksDBCache::get(const std::string &cf, const std::string &key, std::string &value){
auto it = handleMap.find(cf);
if (it == handleMap.end()) return FAILURE;
Status s = db->Get(ReadOptions(), it->second, key, &value);
return s.ok() ? SUCCESS : (s.IsNotFound() ? FAILURE : FAILURE);
}
int32_t RocksDBCache::put(const std::string &cf, const std::string &key, const std::string &value){
auto it = handleMap.find(cf);
if (it == handleMap.end()) return FAILURE;
Status s = db->Put(WriteOptions(), it->second, key, value);
return s.ok() ? SUCCESS : FAILURE;
}3.2 Using Snapshots : To accelerate data loading for new MT servers, snapshots of specific column families are exported via RocksDB’s Checkpoint API, serialized to JSON metadata, transferred with rsync / scp , and imported on the target server using CreateColumnFamilyWithImport . This approach reduces load time from days to a few hours.
Export snapshot code:
void RocksDBCache::createCfSnapshot(const std::string &cfName){
if(handleMap.find(cfName) == handleMap.end()) return FAILURE;
ColumnFamilyHandle* app_cf_handle = handleMap[cfName];
std::string export_dir = "/rocksdb_app_snapshot";
ExportImportFilesMetaData* metadata_ptr = nullptr;
Checkpoint* checkpoint;
Checkpoint::Create(db, &checkpoint);
checkpoint->ExportColumnFamily(app_cf_handle, export_dir, &metadata_ptr);
std::string jsonMetaInfo;
metaToJson(metadata_ptr, jsonMetaInfo);
std::ofstream ofs(export_dir + "/meta.json");
if (ofs.is_open()) {
ofs << jsonMetaInfo << std::endl;
ofs.close();
}
}Import snapshot code:
int32_t RocksDBCache::importSnapshot(const std::string &cfName, const std::string &path){
if(handleMap.find(cfName) != handleMap.end()) return FAILURE;
std::string metaJsonPath = path + "/meta.json";
std::ifstream fin(metaJsonPath, std::ios::binary);
if(!fin.is_open()) return FAILURE;
ExportImportFilesMetaData meta;
jsonToMeta(fin, meta);
fin.close();
ColumnFamilyHandle* app_cf_handle;
db->CreateColumnFamilyWithImport(ColumnFamilyOptions(), cfName,
ImportColumnFamilyOptions(), meta, &app_cf_handle);
return SUCCESS;
}Conclusion : RocksDB’s LSM‑based design, column‑family isolation, and snapshot capabilities make it well‑suited for high‑concurrency, high‑availability services such as Vivo’s message push system. By leveraging these features, the system achieves low latency, cost‑effective storage, and flexible scaling across multiple MT servers.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.