Databases 24 min read

Apache Doris 2.0‑beta Release: New Query Optimizer, Pipeline Engine, Workload Management and Performance Enhancements

Apache Doris 2.0‑beta, released on July 3, 2023, introduces a modern Cascades‑based query optimizer, a data‑driven pipeline execution engine, fine‑grained workload groups, enhanced memory management, partial‑column updates, compute nodes, cold‑hot tiering and cross‑cluster replication, delivering up to tenfold speedups and significant cost reductions for real‑time analytics.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Apache Doris 2.0‑beta Release: New Query Optimizer, Pipeline Engine, Workload Management and Performance Enhancements

Apache Doris 2.0‑beta was officially released on July 3, 2023, with over 255 contributors delivering more than 3,500 improvements.

Download links and source code are provided:

Download: https://doris.apache.org/download GitHub source: https://github.com/apache/doris/tree/branch-2.0

The release follows the 2023 roadmap aiming to support unified online and offline analytics, lake‑warehouse integration, and multi‑modal data processing while reducing system complexity for users.

Key technical innovations include a brand‑new query optimizer based on the Cascades framework (Nereids) that delivers over 10× speedup on TPC‑H benchmarks without manual tuning, and a modern pipeline execution engine that makes query execution data‑driven, improves resource isolation, and boosts mixed‑load performance.

Workload groups enable fine‑grained CPU and memory quotas, query queuing, and automatic resource reclamation, with configuration examples such as:

create workload group if not exists etl_group properties ("cpu_share"="10","memory_limit"="30%","max_concurrency"="10","max_queue_size"="20","queue_timeout"="3000");

Memory management has been overhauled with a new MemTracker, soft limits, and GC mechanisms, eliminating most OOM incidents.

Partial‑column updates are now supported for Unique‑Key tables; an example using Stream Load:

curl --location-trusted -u root: -H "partial_columns:true" -H "column_separator:," -H "columns:id,balance,last_access_time" -T /tmp/test.csv http://127.0.0.1:48037/api/db1/user_profile/_stream_load

Additional enhancements include compute nodes for cloud‑native lakehouse queries, cold‑hot data tiering that reduces storage cost by up to 70 %, new Map/Struct data types, row‑column hybrid storage with point‑query shortcuts, and a CCR cross‑cluster replication feature.

Upgrade notes: the Nereids planner is enabled by default (enable_nereids_planner=true), the vectorized engine flag has been removed, and several configuration defaults have changed (e.g., enable_single_replica_compaction, default use of datev2/datetimev2/decimalv3).

Extensive documentation and links are provided for download, configuration, and detailed feature guides.

performanceDatabaseQuery OptimizationSQL engineApache DorisPipeline ExecutionWorkload Management
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.