MatrixOne HTAP Technical Overview: Architecture, Storage Engine, and Implementation Paths
This article presents a comprehensive technical overview of MatrixOne's HTAP journey, detailing the challenges of hybrid transaction/analytical processing, classifying existing HTAP approaches, describing MatrixOne's storage architecture, AOE and TAE engines, and outlining future roadmap and performance considerations.
MatrixOne, a database startup, aims to build an open‑source, cloud‑native HTAP database called MatrixOne that can serve both OLTP and OLAP workloads with a single unified engine. The company has accumulated over 500,000 lines of code and plans to release a full HTAP engine in the second half of 2022.
HTAP Technical Routes – The article first discusses the fundamental problems of HTAP, such as data freshness versus transaction performance, and compares various academic and industry solutions. It classifies HTAP implementations into five categories: (1) middleware‑based encapsulation, (2) log‑based replication, (3) storage‑level integration using separate replicas for row and column stores, (4) tightly integrated row‑column engines with automatic engine selection, and (5) single‑engine designs where the same data serves both TP and AP workloads.
Each route is illustrated with examples (e.g., HyPer, MemSQL, SQL Server, TiDB, Google F1 Lightning, SingleStore, SAP HANA) and analyzed for real‑time capability, isolation, cost, and complexity.
MatrixOne Storage Architecture – The core of MatrixOne is the MatrixCube replication state machine, which manages multiple storage engines. Row storage uses Pebble for catalog data, while the columnar engine AOE (Append‑Only Columnar Store) handles analytical data. AOE’s layout consists of Segments composed of Blocks, with sparse indexes and zone maps for AP queries.
The design integrates Raft for replication, aiming to reduce data redundancy from four copies (two Raft logs + two data writes) to two copies (one Raft log and one storage write) by using an external log store as WAL.
Future Directions – AOE → TAE – MatrixOne plans to evolve AOE into TAE (Transactional Analytical Engine), a single storage engine that can serve both TP and AP workloads. TAE will expose a KV‑like interface while internally using a columnar layout with segment‑level sorting, delta‑main structures, and flexible schema handling.
The roadmap includes a single‑node HTAP release (mid‑2022), open‑source distributed TAE (Q3 2022), and a stream engine (end‑2022). Performance benchmarks (SSB) claim that MatrixOne 0.2.0 outperforms ClickHouse and StarRocks on pure PK workloads.
Finally, the article invites readers to join the MatrixOne community via GitHub and WeChat, and promotes related data‑science material downloads.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.