Backend Development 22 min read

Root Cause Analysis of Zookeeper‑Dubbo Failure and Automated Migration of Dubbo Services to Spring Cloud

The article diagnoses a 2023 Dubbo outage caused by Zookeeper’s jute.maxbuffer limit being exceeded by millions of temporary nodes, explains why Zookeeper’s CP model is ill‑suited for service registries, and outlines an automated migration to Spring Cloud using generated REST controllers and proxies that achieve comparable performance, especially for larger payloads.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Root Cause Analysis of Zookeeper‑Dubbo Failure and Automated Migration of Dubbo Services to Spring Cloud

On 2023‑12‑14 at 14:26 a massive Dubbo outage occurred when many Dubbo services threw exceptions because they could not connect to the Zookeeper cluster. The Zookeeper ensemble was completely unavailable and logged a series of errors.

Session 0x0 for server dubboZk.xxx.com/10.x.x.x:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.

The Zookeeper logs showed a java.io.IOException: ZooKeeperServer not running and later an Unreasonable length = 10053968 exception during data recovery:

Caused by: java.io.IOException: Unreasonable length = 10053968 at org.apache.jute.BinaryInputArchive.checkLength(BinaryInputArchive.java:166) ...

Investigation revealed that a large number of temporary nodes were created under a single parent node ( /dubbo-test/com.sohu.xxxService/consumers/ ) – 4096 nodes each with a path of about 2450 bytes, totaling roughly 9.58 MB. The default Zookeeper limit (jute.maxbuffer) allows only 2 MB for a single data packet, so the packet size exceeded the limit and triggered the exception.

Adjusting jute.maxbuffer to 20 MB allowed the cluster to restart, confirming that the failure was caused by the oversized data packet generated when the leader tried to close a session that contained a huge list of child nodes.

Fault Diagnosis Steps

Download Zookeeper transaction and snapshot logs.

Modify FileTxnLog.next() to print the oversized packet.

Identify the offending node paths (e.g., consumer%253A%252F%252F10.x.x.x%252Fcom.sohu.xxxService%253Fapplication%253D… ).

Discover that a load‑testing script created an unbounded number of temporary nodes because it repeatedly instantiated Dubbo consumers without proper cleanup.

The root cause was not a single node’s data size but the cumulative size of many child nodes, a scenario Zookeeper does not limit.

Discussion on Zookeeper Suitability

Zookeeper’s CP nature forces it to become unavailable when consistency cannot be guaranteed, making it a poor choice for service registry in highly available micro‑service environments. The analysis also notes that increasing jute.maxbuffer mitigates the immediate issue but introduces latency and stability risks.

Migration Strategy from Dubbo to Spring Cloud

The document proposes a step‑by‑step migration:

Define clear migration goals: no code intrusion, coexistence with Dubbo, interface compatibility, and eventual removal of Dubbo.

Expose Dubbo interfaces as Spring MVC controllers using simple annotations (e.g., @RestController , @RequestMapping ).

Generate proxy classes for consumers that translate method calls into HTTP requests via an AutoWebClient .

Automate code generation during Maven compilation using javapoet , producing static .class files without polluting source control.

Example generated controller:

@RestController
@RequestMapping("/VideoService")
public class VideoController {
    @Autowired
    private VideoService videoService;
    @RequestMapping("/getVideo")
    public Video getVideo(long id) {
        return videoService.getVideo(id);
    }
}

Example generated consumer proxy:

@AutoWebClientProxy
public class VideoServiceProxy implements VideoService {
    private AutoWebClient autoWebClient;
    @Override
    public Video getVideo(long id) {
        AutoWebRequest req = new AutoWebRequest();
        req.setUri("/VideoService/getVideo/long");
        req.setReturnType(Video.class);
        req.addRequestBody("id", String.valueOf(id));
        return autoWebClient.invoke(req);
    }
}

Performance Comparison

Benchmarks using Apache Bench compared native Dubbo calls with the auto‑generated HTTP proxy (AutoWebService). The test environment used Spring Cloud 2021.0.8, Spring Boot 2.6.15, and Dubbo 2.6.13. Key findings:

For small string payloads (≈10 bytes) Dubbo outperformed the HTTP proxy.

For larger objects (>1 KB) the HTTP proxy achieved higher throughput (up to 1.6 × 10⁴ req/s) because Dubbo’s performance degrades with large data volumes.

Both frameworks showed comparable performance for moderate payload sizes.

Additional issues addressed include handling multiple object parameters, generic return types, custom exceptions, and enabling asynchronous calls without changing existing interface signatures (using dynamic proxies and ThreadLocal‑stored CompletableFuture s).

Conclusions

Zookeeper’s strong consistency model makes it unsuitable as a high‑availability service registry for micro‑services.

Dubbo services should be gradually migrated to Spring Cloud, leveraging automated code generation to minimize developer effort.

The AutoWebService solution provides a practical bridge, allowing Dubbo and Spring Cloud services to coexist while preserving existing business logic.

Performance tests indicate that the HTTP‑based approach is competitive, especially for larger payloads.

code generationservice discoveryDubboperformance testingZookeeperSpring CloudBackend Migration
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.