Databases 11 min read

Automated Bug Detection for Distributed Databases Using Statistical Code Path Analysis

The article describes a prototype system that automatically discovers bugs in large distributed databases by instrumenting code, generating massive SQL test cases, statistically analyzing execution paths, visualizing suspicious blocks, and integrating insights from academic papers to guide future debugging and testing efforts.

Qunar Tech Salon

Feb 17, 2020

Automated Bug Detection for Distributed Databases Using Statistical Code Path Analysis

Inspired by an Oracle engineer’s story of spending weeks debugging complex parameter interactions, the article raises the question of whether a program can automatically find bugs while developers sleep.

Testing a distributed database like Oracle 12.2, with millions of lines of C code and countless possible SQL, table, and index combinations, is extremely challenging.

The proposed solution uses statistical analysis of code execution paths collected during massive automated test runs. By coloring code blocks according to the proportion of failing test cases (darker color) and the relative frequency of failures (higher brightness), developers can quickly identify likely buggy regions.

The approach is inspired by the VLDB paper APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems , which defines three modules: SQLFuzz for random SQL generation, SQLMin for minimizing SQL to the smallest failing case, and SQLDebug for instrumenting source code and building statistical models of execution paths.

Implementation details include using the Go‑randgen framework for SQL fuzzing, a custom wrapper tidb-wrapper that instruments TiDB source code and exposes an HTTP trace API, and a block scanner derived from Go’s own source to identify basic blocks (non‑branching code segments).

Metrics visualized on the frontend consist of three components: a color score indicating failure likelihood, a brightness score reflecting the proportion of failures among all failures, and a file‑ranking score based on the density of failing cases per file.

Figures (included as

tags) illustrate the visualization, block coloring, and ranking results.

Preliminary experiments show accurate file‑level diagnostics, though block‑level results are still coarse due to the absence of the SQLMin reduction step.

The article also explores extensions such as applying the technique to source‑code teaching, full‑link test coverage statistics, chaos engineering, and integration with distributed tracing systems like Dapper, OpenTracing, or SkyWalking.

Future work includes supporting parallel test execution, reducing instrumentation overhead, and expanding the prototype toward a production‑ready system that can continuously monitor and debug live distributed databases.

The complete source code is available at https://github.com/fuzzdebugplatform/fuzz_debug_platform , and the authors (PingCAP engineers and university researchers) are listed at the end of the article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Statistical Analysis code instrumentation bug detection database testing Performance Regression SQL fuzzing

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.