How BaikalDB Tackles OLAP Challenges with Vectorized and MPP Engines
BaikalDB, Baidu's distributed storage system, evolves from an OLTP‑focused engine to a hybrid HTAP architecture by introducing a vectorized query engine and a massively parallel processing (MPP) layer, addressing compute and resource bottlenecks for large‑scale analytical workloads while preserving transactional guarantees.
BaikalDB is a distributed storage system built for Baidu's commercial products, unifying dozens of internal storage solutions and supporting massive ad‑material storage and complex OLTP queries. To meet growing offline analytical demands, BaikalDB adds OLAP capabilities on the same data set.
MySQL‑compatible protocol with distributed transactions : Raft‑based three‑replica strong consistency and two‑phase commit ensure atomic cross‑node operations.
Rich search capabilities : Structured, full‑text, and built‑in vector indexes enable LLM‑driven memory and retrieval use cases.
High availability and elastic scaling : Automatic scaling, data balancing, fault recovery, and support for billions of rows across thousands of tables.
Challenges for OLAP in BaikalDB
Compute‑performance bottleneck : The traditional volcano model uses row‑based storage that destroys cache locality, incurs per‑row virtual‑function calls, and blocks multi‑core parallelism, causing super‑linear performance degradation on large data.
Compute‑resource bottleneck : A single node’s CPU and memory become saturated when processing massive datasets.
To transform BaikalDB into an HTAP system, these bottlenecks must be eliminated through vectorized execution, MPP parallelism, and columnar storage.
Vectorized Query Engine
Columnar storage + SIMD acceleration : Data of the same type are stored contiguously, enabling 128/256/512‑bit vector instructions to process multiple values per cycle.
Batch processing for cache affinity : Operators handle blocks of rows, improving CPU D‑Cache/ I‑Cache hit rates and reducing pipeline stalls.
Multi‑core parallelism : Morsel‑driven parallelism splits scans into small chunks (morsels) that are dispatched to many cores; a push‑based pipeline replaces the pull‑model, allowing concurrent execution of adjacent operators.
MPP Parallel Processing
Distributed compute architecture : Data are hash‑ or range‑partitioned across nodes, enabling parallel query execution and drastically reducing response time.
Linear scalability : A shared‑nothing design with high‑speed networking allows horizontal scaling by adding nodes.
Integration with Apache Arrow and Acero
BaikalDB leverages the open‑source Arrow columnar format and the Acero streaming execution engine. Row‑based RocksDB data are converted to Arrow columns, all compute functions are expressed as Arrow compute expressions, and the native BaikalDB execution plan is translated into an Acero ExecNode tree, replacing the volcano model.
RPC between BaikalDB and BaikalStore also uses Arrow columnar payloads, cutting serialization overhead and network traffic.
Exchange Operator for Cross‑Node Shuffle
The ExchangeSender/Receiver pair enables data redistribution across fragments. Three partitioning modes support different scenarios:
SinglePartition : All data are sent to a single downstream fragment (e.g., final result aggregation).
HashPartition : Rows are hashed on a key and repartitioned to multiple receivers, used for parallel Agg/HashJoin.
BroadcastPartition : Data are broadcast to all receivers, useful for small‑table broadcast joins.
Adaptive Execution Strategy
BaikalDB dynamically selects the optimal engine based on query size and statistics: small OLTP queries (< 1 K rows) use the lightweight volcano model; medium‑size analytical queries switch to the vectorized engine; large‑scale scans trigger the MPP engine. Statistics‑driven thresholds decide when to invoke MPP, and the system can hot‑swap engines mid‑execution as data volume grows.
Results and Future Work
Vectorized execution reduces large‑scale query latency by up to 97 % (e.g., 11 s → 300 ms) and cuts peak memory usage by up to 56 %.
Adding MPP on top of vectorization yields an additional 54 % latency reduction on average and up to 80 % memory savings.
Future directions include native columnar storage (eliminating row‑to‑column conversion) and cost‑based optimization (CBO) to improve MPP decision making.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.