Big Data 13 min read

Building a Unified Real‑Time and Offline OLAP Platform with DorisDB at Yuanfudao

The article describes how Yuanfudao's data middle platform built a high‑performance OLAP service using the MPP HOLAP engine DorisDB to unify real‑time and batch analytics, meet low‑latency and high‑concurrency requirements, and support diverse education‑industry use cases such as live‑stream monitoring, advertising, and order analytics.

DataFunTalk
DataFunTalk
DataFunTalk
Building a Unified Real‑Time and Offline OLAP Platform with DorisDB at Yuanfudao

Yuanfudao's data middle platform provides standardized data sets (OneData) and unified services (OneService) for multiple education products, requiring a reliable OLAP platform that can serve both real‑time and offline queries.

Business Background and Requirements

The platform must handle massive daily data, support metrics like user activity, order revenue, channel conversion, and renewal rates, and provide low‑latency, real‑time insights. It needs to ingest both streaming and batch data, support complex multi‑table joins, high concurrency, and easy‑to‑use SQL.

OLAP Engine Requirements

Second‑ or millisecond‑level query latency.

Efficient handling of wide tables and multi‑table joins.

High‑concurrency support.

Streaming and batch data ingestion.

Standardized SQL with low learning cost.

Accurate deduplication.

Scalable online expansion with low ops cost.

Technology Selection and Comparison

The team evaluated MOLAP (e.g., Druid, Kylin), ROLAP (e.g., Presto, ClickHouse), and HOLAP solutions. MOLAP offers pre‑aggregation but lacks flexibility; ROLAP is flexible but can be unstable for complex queries. HOLAP combines both advantages, with DorisDB emerging as the best fit due to strong performance, MySQL compatibility, and low operational overhead.

Application Scenarios

Real‑time Live‑Stream Quality Monitoring : Minute‑level metrics such as network quality, packet loss, and audio/video availability are served by DorisDB.

Offline Interactive Queries and BI Reports : Migrating from MySQL to DorisDB reduced query latency by several orders of magnitude and simplified JDBC integration.

Near‑real‑time Order and Renewal Data : Hive historical data and binlog streams are ingested via Flink SQL into DorisDB, enabling fast cross‑team analytics.

Real‑time Advertising Strategy : Minute‑level ad performance data is streamed into DorisDB for unified reporting.

Monitoring and Operations

Key cluster health metrics (FE/BE node loss, disk failures, CPU usage, memory pressure) and query‑level alerts (large scans, slow queries >2 min, connection spikes) are tracked. An audit platform captures DDL operations and slow queries, feeding logs into Elasticsearch for analysis.

Ecosystem Integration

Custom Flink connectors, Stream Load/Broker Load pipelines, and a Presto‑DorisDB catalog were built to enable cross‑source queries and seamless data ingestion.

Future Outlook

Planned extensions include bitmap‑based multi‑dimensional analysis, a generic event‑analysis platform, and further automation of cluster scaling and upgrades.

Conclusion

By adopting DorisDB, Yuanfudao achieved a unified streaming‑batch OLAP engine that delivers low‑latency, high‑throughput analytics, strengthens the OneData/OneService ecosystem, and provides a solid foundation for future data‑platform evolution.

Big Datareal-time analyticsData WarehouseOLAPEducation TechnologyDorisDB
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.