Databases 12 min read

Building a Unified Real‑time and Offline OLAP Platform with DorisDB at Yuanfudao

Yuanfudao's data middle platform leverages the MPP database DorisDB to create a unified OLAP system that supports both real‑time and batch analytics, handling millions of queries daily with sub‑second latency while meeting complex business requirements across its education services.

Yuanfudao Tech
Yuanfudao Tech
Yuanfudao Tech
Building a Unified Real‑time and Offline OLAP Platform with DorisDB at Yuanfudao

Yuanfudao's data middle platform provides standardized data sets (OneData) and unified data services (OneService) for multiple business lines. The core OLAP platform, built on the next‑generation MPP database DorisDB, unifies real‑time and offline analytics, handling millions of queries daily with sub‑second latency.

Business Background and Requirements

As a leading online education provider, Yuanfudao generates massive daily data and needs both real‑time and batch analytics for metrics such as channel conversion, user retention, and live‑stream quality. The platform must support low‑latency queries, complex multi‑table joins, high concurrency, streaming and batch ingestion, standardized SQL, deduplication, and easy horizontal scaling.

OLAP Engine Requirements

Second‑ or millisecond‑level query latency.

Efficient handling of wide tables and multi‑table joins.

High‑concurrency support.

Streaming and batch data ingestion for real‑time and offline ETL.

Standardized SQL to lower user learning cost.

Effective deduplication.

Good online scalability with low operational overhead.

Technology Selection

The team evaluated MOLAP, ROLAP, and HOLAP engines. MOLAP (e.g., Druid, Kylin) offers pre‑aggregation but lacks flexibility. ROLAP (e.g., Presto, ClickHouse) provides flexibility but can be unstable for complex queries. HOLAP combines both, and DorisDB emerged as the best fit, offering strong query performance, MySQL compatibility, and suitability for both streaming and batch workloads.

Typical Use Cases

Real‑time live‑stream quality monitoring, offline interactive queries and BI reporting, near‑real‑time order and renewal data, and real‑time advertising effectiveness analysis. All scenarios ingest data via Flink SQL or Stream/Broker Load into DorisDB, achieving several‑fold to hundred‑fold performance improvements over MySQL.

Operations and Monitoring

Key cluster health metrics (FE/BE node loss, disk failures, CPU usage, memory pressure) and query‑level alerts (large scans, slow queries, connection spikes) are monitored. An audit platform tracks large and slow queries, feeding alerts to users for optimization.

Ecosystem Integration

Custom Flink connector, Stream Load and Broker Load for batch ingestion, and a Presto DorisDB catalog were developed to enable cross‑source queries and seamless BI integration.

Future Plans

Explore bitmap‑based multidimensional analysis, a generic event analysis platform, and further automation of operations, testing, and scaling scripts.

Overall, introducing DorisDB has created a one‑stop storage and query engine that unifies streaming and batch data, delivering consistent, easy‑to‑use data services and laying a solid foundation for Yuanfudao's data platform evolution.

big datareal-time analyticsDatabasedata warehouseOLAPDorisDB
Yuanfudao Tech
Written by

Yuanfudao Tech

Official Yuanfudao technology account, using tech to empower education development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.