Databases 15 min read

Sequence-Oriented DBMS Fuzzing with LEGO: A Novel Database Vulnerability Discovery Approach

The paper introduces LEGO, a sequence‑oriented DBMS fuzzing framework that leverages type‑affinity analysis to generate rich SQL statement sequences, achieving 44%‑198% higher coverage and uncovering numerous new vulnerabilities across MySQL, PostgreSQL, MariaDB, and Comdb2, as demonstrated at ICDE 2023.

AntTech
AntTech
AntTech
Sequence-Oriented DBMS Fuzzing with LEGO: A Novel Database Vulnerability Discovery Approach

0 Abstract

Databases are the foundational systems that store user data, making their security and stability critical; vulnerabilities can expose private information and sensitive data. Existing databases such as MySQL, Oracle, and OceanBase consist of millions of lines of code, and efficiently automating vulnerability discovery remains a pressing challenge. This work proposes a novel database vulnerability discovery method that uncovered a large number of bugs in several major DBMS, significantly raising the security baseline of the industry.

Modern large‑scale DBMS contain many statement types and dialects, which increase the difficulty of fuzz testing. Prior state‑of‑the‑art fuzzers rely on a limited set of predefined seed types, restricting code‑space exploration. Our approach introduces LEGO, a fuzzing engine that actively generates rich SQL execution sequences based on the concept of “type‑affinity” (adjacent SQL statements). During each fuzzing iteration, LEGO creates diverse SQL statement pairs, analyzes their type‑affinity via coverage feedback, and synthesizes sequences containing newly discovered valuable affinities for further mutation.

Experiments on PostgreSQL, MySQL, MariaDB, and Comdb2 show that LEGO improves coverage by 44%–198% compared to existing tools such as SQLancer, SQLsmith, and SQUIRREL, and discovers dozens of new CVEs.

1 Background

Database Management Systems (DBMS) serve as the bridge for managing user data, and their security is paramount. Traditional vulnerabilities like stack overflows can lead to data theft, forgery, or denial‑of‑service attacks.

Generating rich SQL sequences is crucial for effective DBMS fuzzing. Figure 1 illustrates the concept of an SQL sequence, where each test case comprises multiple statements (e.g., CREATE, INSERT, SELECT) arranged in a specific order.

Existing fuzzers either generate syntactically correct test cases without considering statement type combinations, or mutate existing seeds without altering statement types, limiting exploration of the program’s execution paths.

2 SQL Type Sequences

Figure 2 demonstrates how different SQL execution orders affect program paths. Two test cases with the same four statements but different orders produce divergent results: the first sequence (CREATE TABLE → INSERT → INSERT → SELECT) returns data, while the second (CREATE TABLE → SELECT → INSERT → INSERT) yields an empty result.

Challenges in generating effective SQL sequences include:

Combinatorial explosion: PostgreSQL defines 188 statement types and 349 sub‑types; a test case with 20 statements yields ~3×10^45 possible combinations, far beyond practical execution limits.

Many sequences are meaningless: Executing a SELECT before the corresponding CREATE TABLE leads to semantic errors and low coverage; unrelated statement pairs (e.g., CREATE TABLE followed by a permission change) also provide little benefit.

Randomly generated test cases may be ineffective: Overly long sequences with repetitive INSERTs do not improve coverage and slow down fuzzing.

To address the first two challenges, we introduce type‑affinity analysis to synthesize meaningful sequences; to mitigate the third, we limit the maximum sequence length and employ synthesis to generate varied seed lengths.

3 Design and Implementation

Figure 4 shows LEGO’s overall fuzzing workflow: random seeds are selected from a pool, mutated based on sequence operations, analyzed for new type‑affinities, and valuable affinities are recorded. New sequences are then synthesized from the affinity database, instantiated, further mutated, and fed to the DBMS while monitoring for bugs.

3.1 Proactive Affinity Analysis

Proactive affinity analysis consists of two parts: learning SQL statement type‑affinities and performing sequence‑oriented mutations.

3.1.1 Type‑Affinity

Type‑affinity refers to adjacent SQL statements in a test case; for example, an INSERT following a CREATE TABLE yields the (CREATE TABLE, INSERT) affinity.

3.1.2 Proactive Sequence‑Oriented Mutation

Sequence‑oriented mutation enables LEGO to actively discover valuable statement combinations. Figure 5 outlines the mutation algorithm, which includes three operations: substitution, insertion, and deletion.

Substitution: Randomly replace a statement with another type, re‑instantiate the SQL, and record any new coverage‑inducing affinities.

Insertion (InsertAfter): Insert a randomly chosen statement after a selected one, re‑instantiate, and capture new affinities.

Deletion: Remove a statement, re‑instantiate, and record any new affinities.

3.1.3 Type‑Affinity Analysis

To extract affinities, LEGO identifies statement types via an AST model that handles the many dialects of each DBMS (e.g., PostgreSQL’s 188 types). It records adjacent type pairs, ignoring consecutive identical types, and stores the discovered affinities for later synthesis.

3.2 Progressive Sequence Synthesis

After obtaining high‑quality affinities, LEGO progressively synthesizes new test cases that prioritize these affinities. Figure 8 illustrates how a new affinity (INSERT → CREATE TRIGGER) guides the generation of a sequence (2 → 3 → 5 → 4) that ultimately reveals CVE‑2021‑35643.

LEGO defines a default starting node (CREATE TABLE) and caps the maximum number of statements to avoid overly large test cases.

3.3 Implementation

LEGO is built on AFL++, with custom mutators for sequence operations. Affinity analysis and synthesis rely on AST parsing, reusing Squirrel’s IR structures and extending them to MySQL, PostgreSQL, MariaDB, and Comdb2.

4 Experimental Results

Continuous vulnerability hunting with LEGO uncovered numerous new bugs, including stack overflows, use‑after‑free, illegal address accesses, and null‑pointer dereferences, resulting in dozens of CVEs and multiple acknowledgments from database vendors.

5 Conclusion

This work presents LEGO, a DBMS‑focused fuzzing tool that leverages SQL statement type‑affinities to generate richer test case sequences. By combining proactive sequence mutation and progressive synthesis, LEGO explores more code space and achieves higher coverage and efficiency than prior state‑of‑the‑art fuzzers, discovering many new vulnerabilities across PostgreSQL, MySQL, MariaDB, and Comdb2.

6 References

[1] https://icde2023.ics.uci.edu/papers-research-track/

database fuzzingICDE 2023LEGOSQL sequencetype-affinityvulnerability discovery
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.