Databases 22 min read

OceanBase HTAP Capabilities: Architecture, Performance, and Core Technologies

This article provides a comprehensive overview of OceanBase's HTAP capabilities, detailing its core features, technical architecture, execution and storage engines, resource isolation, fast import mechanisms, performance benchmarks, and future enhancements, supported by diagrams and real‑world deployment data.

DataFunSummit
DataFunSummit
DataFunSummit
OceanBase HTAP Capabilities: Architecture, Performance, and Core Technologies

OceanBase is an open‑source, distributed, integrated database that originated in 2010 as an internal project at Taobao and has evolved through multiple versions (0.X, 1.0, 2.x, 3.x, 4.x) to become a high‑performance HTAP system used by Ant Group, banks, insurance, telecom, and energy industries.

Key core features include extremely high OLTP performance (7.07 × 10⁸ tpmC in TPC‑C benchmarks), sub‑8‑second recovery for high availability, horizontal and vertical scalability that supports thousands of nodes and petabyte‑scale data, dual compatibility with MySQL and Oracle, strong HTAP support (both TP and AP workloads), low storage cost thanks to an LSM‑Tree engine with columnar compression (often 1/3–1/4 of traditional row‑store size), native multi‑tenant isolation, and built‑in security such as transparent encryption and audit.

The overall architecture consists of multiple zones, each containing many OB servers; data is sharded into partitions with Paxos‑based replication, enabling clusters of over 1 500 nodes, peak processing of 61 million transactions per second, and tables exceeding 3.2 billion rows.

The execution engine supports both serial and parallel execution. Parallel execution splits a query plan into fragments (DFOs) that can be scheduled across nodes, while adaptive execution decides at runtime whether to build hash tables for GROUP BY aggregations based on data reduction estimates.

The optimizer has evolved from a two‑stage (serial then parallel) plan generation in 3.x to a one‑stage approach in later versions, with advanced parallel push‑down techniques that handle complex queries, distinct aggregates, and window functions efficiently.

The storage engine uses a simple LSM‑Tree structure with columnar layout, enabling high‑throughput writes without random I/O and powerful compression. Columnar storage also allows filter push‑down without decompression, improving OLAP query performance.

Resource isolation is achieved through native multi‑tenant containers and resource groups that limit CPU, memory, and IOPS per tenant, preventing heavy analytical workloads from affecting latency‑critical transactional queries. Physical isolation can be added via read‑only replicas for AP workloads.

OceanBase 4.0 introduced a Direct‑Path fast‑import feature that bypasses the SQL layer and writes directly to SSTables, achieving 4–5× higher ingestion speed for bulk loads while keeping the table readable.

The article concludes that OceanBase delivers a unified platform for both TP and AP workloads, offering high availability, strong security, extensive ecosystem tools, and ongoing enhancements such as external table support in upcoming releases.

performanceStorage EngineDistributed DatabaseHTAPExecution EngineOceanBase
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.