CDubbo Upgrade Journey: From 2.5.10 to 2.7.3 – Issues, Fixes, and Performance Evaluation
This article details CTrip's migration of its internal Dubbo‑based RPC framework (CDubbo) from version 2.5.10 to 2.7.3, describing the motivations, encountered incompatibilities, step‑by‑step resolutions, performance regressions, and the comprehensive compatibility, stress, and integration testing performed to ensure a stable production rollout.
CDubbo, CTrip's customized version of Apache Dubbo, was first introduced in November 2017 and officially released in April 2018 (v0.1.1). By the time of writing, the framework had evolved to version 0.13.3, serving roughly 156 server applications, 170 client applications, and about 2,000 production instances.
Reasons for upgrading to 2.7.3
Asynchronous support in 2.5.10 was limited to client‑side Future based on JDK 1.6, which still blocked threads.
Need for true server‑side asynchronous calls to improve CPU utilization and latency.
Minor incompatibility issues between client and server versions were manageable.
Preparation for future upgrades (e.g., 3.0 Reactive).
Splitting the monolithic registry into registration, metadata, and configuration centers (three‑center model).
First‑stage upgrade to 2.7.0 – key issues and fixes
1) Package rename : com.alibaba → org.apache caused missing classes such as DubboComponentScan . Compilation errors were resolved by updating imports.
2) Constants split : Apache introduced multiple constant classes (e.g., RegistryConstants , CommonConstants ); developers replaced old references accordingly.
3) Router interface changes : New methods isRuntime , isForce , getPriority required code updates.
4) ProxyFactory new method : Added getProxy ; default implementation retained via delegate when not needed.
5) ApplicationConfig uniqueness : From 2.7.0 onward, only one global ApplicationConfig is allowed; duplicate definitions now trigger errors.
6) JDK 1.8 requirement : Use of CompletableFuture , Supplier , Consumer forced all services to compile with JDK 1.8.
7) Netty version change : Default switched from Netty 3 to Netty 4, breaking monitoring hooks; temporarily reverted to Netty 3.
8) Async request hang : FutureAdapter wrapping caused mismatched futures; fixed in 2.7.2.
9) Service‑side async key removal : ASYNC_KEY stripped during URL merge, preventing async calls; fixed in 2.7.2.
10) @Service parameters conversion : Missing conversion from String[] to Map caused bean definition errors; added conversion logic (see code snippet below).
private AbstractBeanDefinition buildServiceBeanDefinition() {
BeanDefinitionBuilder builder = rootBeanDefinition(ServiceBean.class);
// Convert parameters into map
builder.addPropertyValue("parameters", convertParameters(serviceAnnotationAttributes.getStringArray("parameters")));
}11) Client discovery OOM risk : URLs cached with timestamps grew unbounded; cleared cache before each proxy creation (fixed in 2.7.2).
12) Client‑side @Reference null proxy : Using Alibaba package caused null proxies; switched to Apache package for compatibility (fixed in 2.7.3).
13) Server‑side executes limit bug : Counter decremented twice on error, disabling rate limiting; corrected by avoiding double decrement (fixed in 2.7.3).
14) ListenableFuture vs CompletableFuture : Existing SOA services used Guava ListenableFuture ; CDubbo chose to support only CompletableFuture , requiring manual migration.
15) Netty version mismatch between client and server : Server pushed Netty 3 name change to Netty 3 → netty3; client 2.5.10 could not locate resources; resolved by stopping version push (fixed in 2.7.4).
Second‑stage upgrade to 2.7.2
Performance testing revealed a ~40% throughput drop (from ~80k QPS to ~48k QPS) due to a JDK 1.8 CompletableFuture bug where get() waited for 256 countdowns. The issue was tracked (GitHub #4279) and mitigated.
Third‑stage upgrade to 2.7.3‑SNAPSHOT
Additional problems discovered and fixed before the final release:
Method‑level timeout ignored (service‑level timeout applied) – corrected by delegating timeout handling to CompletableFuture only.
Async error handling did not invoke listener onError – filter chain updated to call onError for exceptions.
@Reference annotation not initializing client – resolved by adding Apache compatibility for Alibaba annotations.
Server‑side executes limit failure on exception – fixed by preventing double decrement.
Netty3 missing error when client older than server – addressed by adjusting dynamic config push rules.
Compatibility testing
All test cases passed for mixed versions (server 2.7.3, client 2.5.10), covering registration, synchronous and asynchronous calls, monitoring, and exception handling (including ClassNotFoundException for new RPC exception types).
Performance and stress testing
Throughput remained stable below 40k QPS; above 50k QPS response latency increased due to Young GC pauses. Detailed charts (omitted) compared 2.5.10 vs 2.7.3 on both server and client sides.
Integration testing
All CDubbo components passed their respective test suites, but additional challenges were identified:
ApplicationConfig conflicts across multiple components – resolved by enforcing identical name values.
Open‑source vs customized Dubbo version clashes – requires coordination between teams.
Default protocol port mismatch leading to connection failures – fixed by adjusting protocol configuration defaults (to be resolved in 2.7.4).
Additional topics of interest
The article concludes with a brief overview of registration, configuration, and metadata centers, and explains CTrip's confidence in large‑scale version upgrades due to high test coverage (93%), extensive benchmark testing (>50k QPS for a day), and a disciplined release process.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.