Big Data 8 min read

Optimizing Apache Doris Performance: A Case Study in Query Processing

Youzan replaced ClickHouse and Druid with Apache Doris, refined its vectorized engine by eliminating deserialization overhead in the merge‑aggregation phase, achieving roughly a 30 % query‑time boost, and validated compatibility through SQL rewriting and traffic replay, while planning further SIMD‑based optimizations and broader adoption.

Youzan Coder
Youzan Coder
Youzan Coder
Optimizing Apache Doris Performance: A Case Study in Query Processing

This article discusses the implementation and optimization of Apache Doris as a replacement for ClickHouse and Druid in the OLAP (Online Analytical Processing) system at Youzan, a merchant service company. The authors explain that while Apache Kylin is stable for MOLAP (Multidimensional OLAP) use cases, ClickHouse for ROLAP (Relational OLAP) presents challenges with scaling and join performance. To address these issues, Youzan explored using ClickHouse on Apache Doris but later focused on Doris's vectorized engine after it became an Apache top-level project.

The article details performance testing and optimization efforts, particularly focusing on improving the two-phase aggregation process. The authors identified bottlenecks in the merge aggregation phase where deserialization and temporary object creation were causing significant overhead. They implemented optimizations that directly converted aggregation results to the required data types, eliminating unnecessary deserialization steps and reducing memory allocation overhead. These changes resulted in a 30% performance improvement in query response time.

Additionally, the article covers compatibility testing with Druid workloads through SQL rewriting and traffic replay mechanisms. The team developed a system to capture Druid queries, convert them to Doris-compatible SQL, and execute them to identify compatibility issues and performance gaps. This approach allowed them to systematically address differences in built-in functions and other SQL dialect variations between the systems.

The authors conclude with plans to continue Doris adoption at Youzan, aiming to consolidate their technology stack and resolve the scaling and performance limitations they experienced with ClickHouse and Druid. Future work includes further compatibility testing, performance optimization, and potentially implementing SIMD (Single Instruction, Multiple Data) optimizations for critical execution paths.

Query OptimizationPerformance TuningClickHouseOLAPDruidbig data analyticsApache DorisSQL Compatibilitytwo-phase aggregationVectorized Execution
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.