Big Data 9 min read

Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform

This article details how KuJiaLe's big data team replaced their legacy ADB and Presto clusters with a DorisDB MPP database, achieving sub‑second query latency, unified real‑time and offline analytics, simplified ETL pipelines, and significant cost savings while supporting billion‑row tables and high‑QPS workloads.

DataFunTalk
DataFunTalk
DataFunTalk
Practical Experience of Using DorisDB for Real-Time and Offline Analytics in KuJiaLe's Big Data Platform

KuJiaLe, a leading brand of Qunhe Technology, focuses on cloud design systems and 3D content production, providing design rendering, marketing display, construction, and geometric modeling solutions for enterprise customers across home, real‑estate, and public‑construction domains.

The big data team built a data platform supporting BI, commercial data products, online analytics, and user profiling. In production they replaced an Alibaba Cloud ADB cluster and an EMR Presto cluster with a 10‑node DorisDB cluster, achieving query performance comparable to ADB and reducing Presto P95 latency from seconds to 500 ms, while delivering more than double the cost‑effectiveness.

DorisDB unifies real‑time and offline analysis, eliminating the complexity of multiple systems, simplifying ETL processes, and dramatically improving ad‑hoc query efficiency.

As business and data volumes grew, raw logs, business databases, and third‑party APIs were ingested into a Hadoop‑based offline warehouse, then synchronized to MySQL, Elasticsearch, Presto, HBase, etc., creating latency and scalability challenges.

The team aimed to provide an open data‑service layer that balances data scale, QPS, latency, and operational cost, evaluating ROLAP engines such as Impala, Druid, ClickHouse, and DorisDB.

Key challenges included excessive offline/real‑time ETL jobs, the need for sub‑200 ms query response for medium‑scale workloads, and support for real‑time insert/update scenarios like user profiling and monitoring dashboards.

After evaluation, DorisDB was selected for its MPP architecture, native storage, primary‑key updates, materialized views, high concurrency, and instant query capabilities, which directly addressed the identified pain points.

In production, DorisDB serves as the ROLAP engine for both offline and real‑time data. Offline data is batch‑loaded from ODPS via DataX, while real‑time data streams from Kafka. DataX and Flink jobs write through a Doris Proxy using HTTP Stream Load. The platform now handles daily increments of over a hundred million rows, supports aggregation over billions of rows with 500 ms latency, and executes multi‑table joins in seconds.

Real‑time data‑link exploration highlighted DorisDB’s fast streaming writes (minute‑level micro‑batches via HTTP Stream Load and second‑level latency via Kafka), unified offline/online analysis with dynamic partition cleanup, and powerful SQL online serving that adheres to industry standards.

In summary, adopting DorisDB closed the gap for billion‑row, high‑QPS scenarios, opened real‑time querying on detail tables, reduced ETL and aggregation table maintenance, and lowered operational costs. Future plans include migrating more models to aggregation tables and materialized views, exploring DorisDB on Elasticsearch, expanding real‑time use cases, deep integration with KuJiaLe’s data integration platform, and adopting a multi‑cloud architecture for greater flexibility.

The team thanks the DorisDB technical team for their enthusiastic support and reliable assistance.

performancebig datareal-time analyticsdata warehouseETLMPPDorisDB
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.