Databases 18 min read

MatrixOne Architecture and OLAP Engine Design Overview

This article presents an in‑depth overview of MatrixOne, an open‑source hyper‑converged cloud‑native database, detailing its three‑tier architecture of compute, data and file services, and explains the design and implementation of its OLAP engine, including parser, planner, optimizer, and push‑based execution pipeline.

DataFunTalk
DataFunTalk
DataFunTalk
MatrixOne Architecture and OLAP Engine Design Overview

MatrixOne is an open‑source hyper‑converged heterogeneous cloud‑native database management system that supports OLTP, OLAP, and streaming workloads across public clouds, private data centers, and edge nodes.

The original share‑nothing architecture based on a Multi‑Raft cluster suffered from scalability, performance, and cost issues. The new architecture separates the system into three layers: a compute layer composed of Compute Nodes (CN), a data layer of Data Nodes (DN) that store only metadata, and a File Service that abstracts various storage back‑ends (local disk, NFS, HDFS, object storage). A HA Keeper component provides cluster membership and health information similar to ZooKeeper.

Key advantages of the redesigned share‑storage architecture include: DN nodes no longer hold data, simplifying elastic scaling and reducing migration; data integrity and consistency are ensured by the underlying storage (e.g., S3) at lower cost; and compute tasks can be fully decoupled from storage, allowing separate CN clusters for TP and AP workloads.

The OLAP engine is built from four stages: Parser (SQL → AST), Planner (AST → logical plan), Optimizer (rule‑based and cost‑based transformations), and Execution (logical plan → executable pipeline). The optimizer reduces I/O through column pruning, predicate push‑down, predicate inference, and runtime filters, and reduces computation via join‑order selection, aggregation push‑down, and other cost‑based rewrites.

For join ordering, MatrixOne combines a greedy pruning step (identifying fact and dimension tables, performing early fact‑dimension joins) with dynamic programming for the remaining sub‑trees, extending efficient planning beyond ten tables.

The execution engine adopts a push‑based model rather than the traditional volcano pull model. Data flows in blocks (8192 rows) through a pipeline of operators, improving cache locality and reducing function‑call overhead. Operators are grouped into pipelines that can be parallelized across CPUs and nodes, with connectors and dispatchers handling one‑to‑one and one‑to‑many data movement, respectively.

Overall, the article details how MatrixOne’s architecture and OLAP engine address scalability, performance, and cost challenges for modern analytical workloads.

Cloud NativeQuery Optimizationdatabase architectureOLAPExecution EngineMatrixOne
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.