Performance Evaluation of Multi-Table Joins in MySQL and Oracle with Large Datasets
This article investigates the Alibaba Java Development Manual's recommendation against joining more than three tables by experimentally evaluating multi-table join performance in MySQL and Oracle using massive synthetic datasets, analyzing query execution times, indexing effects, and providing data generation scripts and detailed results.
The author questions the Alibaba Java Development Manual rule that forbids joining more than three tables, and designs a series of SQL experiments to verify its validity.
Problem Statement
Why does the manual prohibit joins across more than three tables? The author sets out to test this claim by constructing a realistic schema with four tables: student , teacher , course , and sc (the relationship table).
Experimental Environment
All tests are run on a VM (VMware 10) with CentOS 7.4 and MySQL 5.7.22. The machine has 4 GB RAM, 4 CPU cores, 50 GB SSD storage. MySQL is configured with a 2 GB buffer pool.
Experiment Description
The goal is to find the student with the highest score for courses taught by teacher 'tname553' . The main query joins the four tables and includes a sub‑query to obtain the maximum score.
select Student.Sname,course.cname,score
from Student,SC,Course ,Teacher
where Student.s_id=SC.s_id and SC.c_id=Course.c_id and sc.t_id=teacher.t_id
and Teacher.Tname='tname553'
and SC.score=(select max(score) from SC where sc.t_id=teacher.t_Id);The author decomposes this into three simpler statements to illustrate the logic.
select max(score) from SC ,Teacher where sc.t_id=teacher.t_Id and Teacher.Tname='tname553';
select sc.t_id,sc.s_id,score from SC ,Teacher
where sc.t_id=teacher.t_Id and score=590 and Teacher.Tname='tname553';
select Student.Sname,course.cname,score
from Student,SC ,course
where Student.s_id=SC.s_id and sc.s_id in (20769800,48525000,26280200) and course.c_id = sc.c_id;Data Generation Scripts
To stress the system, the author generates massive synthetic data sets: 10 million enrollment records, 5 million students, 1 million teachers, and 1 000 courses for the first test; later scaling up to 100 million enrollment records, 50 million students, 10 million teachers, and the same number of courses. The MySQL scripts create tables and populate them with loops.
use stu;
create table student (
s_id int(11) not null auto_increment,
sno int(11),
sname varchar(50),
sage int(11),
ssex varchar(8),
father_id int(11),
mather_id int(11),
note varchar(500),
primary key (s_id),
unique key uk_sno (sno)
) engine=innodb default charset=utf8mb4;
-- function to insert 50 million students
create function insert_student_data() returns int deterministic
begin
declare i int;
set i=1;
while i<50000000 do
insert into student values(i,i,concat('name',i),i,
if(floor(rand()*10)%2=0,'f','m'),
floor(rand()*100000),floor(rand()*1000000),concat('note',i));
set i=i+1;
end while;
return 1;
end;Similar functions are provided for course , sc , and teacher tables, both for MySQL and Oracle (PL/SQL) environments.
Test Results on MySQL
The author presents two screenshots (omitted here) showing query times for different data volumes and indexing configurations. Key observations include:
When join keys lack indexes, queries become extremely slow, confirming the need for proper indexing in multi‑table joins.
Simplified queries (splitting the original join) remain tolerable up to 100 million rows, whereas the full four‑table join fails beyond ~15 million rows on the author's machine.
MySQL’s performance degrades noticeably with larger joins, suggesting that the “no more than three tables” rule is driven by MySQL’s limitations at high data scales.
Oracle Comparison
Running the same four‑table join on Oracle with no indexes completes in about 26 seconds for the large data set, demonstrating Oracle’s superior join handling compared to MySQL.
Conclusion
The experiment confirms that the Alibaba rule is primarily a MySQL‑centric guideline: with sufficient hardware and proper indexing, MySQL can handle joins beyond three tables, but performance quickly becomes a bottleneck at very large scales. For high‑throughput, high‑concurrency systems, it is advisable to keep SQL simple and push complex logic to the application layer.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.