Performance Evaluation of Dubbo 3.0 Address Push: Interface vs Application-Level Service Discovery
This article presents a performance comparison of Dubbo 2, Dubbo 3 interface‑level, and Dubbo 3 application‑level address discovery models under a million‑instance scenario, showing significant reductions in memory usage and GC frequency for the newer models.
1 Abstract
This article reports a performance test of the next‑generation microservice framework Dubbo 3.0 focusing on its address push pipeline. It qualitatively compares Dubbo 2 interface‑level discovery, Dubbo 3 interface‑level discovery, and Dubbo 3 application‑level discovery, noting that Dubbo 3 reduces resident memory by more than 50% compared with Dubbo 2 and that the application‑level model further cuts memory usage by about 40% while almost eliminating incremental memory allocation during scaling.
2 Background Introduction
2.1 Overview of Dubbo 3.0
Dubbo 3.0 is a fusion product of Alibaba's internal HSF framework and the open‑source Dubbo, upgraded to a cloud‑native architecture and intended to become the primary middleware for both internal and community users.
It was created to support Alibaba's overall migration to cloud services, consolidating development effort, improving product quality, and enabling seamless interaction between HSF and Dubbo ecosystems.
2.2 Performance Test and Comparison of Different Dubbo Versions
The test focuses on the orange‑highlighted address push path in the service framework, comparing consumer behavior across Dubbo 2, Dubbo 3 interface‑level, and Dubbo 3 application‑level models when pushing one million address instances.
Test scenarios:
Dubbo 2 – baseline reference.
Dubbo 3 interface‑level discovery (same model as Dubbo 2).
Dubbo 3 application‑level discovery (new cloud‑native model).
Test Environment and Method
Test Data : Simulated 2.2 million (interface‑level) cluster instance address pushes, i.e., a single consumer process subscribes to 2.2 M addresses.
Environment : 8‑core CPU, 16 GB RAM Linux machine, JVM heap set to 10 GB.
Method : Consumer process subscribes to 700 interfaces; ConserverServer acts as registry, continuously simulating address changes for over one hour while collecting GC and memory metrics.
3 Optimization Results and Comparison
3.1 GC Time and Distribution
Images show GC behavior for each model (Dubbo 2 interface, Dubbo 3 interface, Dubbo 3 application).
3.2 Incremental Memory Allocation
Images illustrate memory allocation trends for the three models, highlighting the reduction achieved by Dubbo 3.
3.3 Old Generation (OLD) Space and Resident Memory
Images compare OLD space usage and resident memory across the models, demonstrating that the application‑level model reduces resident memory by nearly 40% compared with the interface‑level model.
3.4 Consumer Load
Images depict consumer load metrics for Dubbo 3 interface‑level and application‑level models.
4 Detailed Comparison and Analysis
4.1 Dubbo 2 Interface Model vs Dubbo 3 Interface Model
Under 2 M address scale, Dubbo 2 quickly exhausts heap memory and triggers frequent GC, causing the process to become unresponsive. Optimized Dubbo 3 experiences only three Full GCs in one hour and reduces unreleased memory by about 1.7 GB.
4.2 Dubbo 3 Interface Model vs Dubbo 3 Application Model
Switching to the application‑level discovery model yields further resource reductions: incremental memory growth during instance scaling is minimal, and resident memory drops another ~40% to around 900 MB.
Future work includes optimizing metadata and URL object reuse in the application‑level implementation.
5 Conclusion
Dubbo 3.0 has successfully merged Dubbo and HSF, with cloud‑native features progressing rapidly. It proved stable during Alibaba’s recent Double‑11 shopping festival and is being piloted in other e‑commerce services. Ongoing efforts will focus on finalizing the application‑level service discovery, new governance rules, the next‑generation Triple protocol, and meeting the non‑functional goals of resource usage, performance, and cluster scalability.
The address‑push performance test validates the substantial resource savings of the application‑level model, reinforcing confidence in its scalability for future large‑scale clusters.
Reference Reading:
How Kafka Achieves Million‑TPS
Distributed Task Scheduling System Design: Go Implementation
Domain‑Driven Design Framework Axon Practice
Implementing Paxos in Go
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.