Databases 13 min read

Performance Evaluation of Multi-Table Joins in MySQL and Oracle with Large Datasets

This article investigates the Alibaba Java Development Manual's recommendation against joining more than three tables by experimentally evaluating multi-table join performance in MySQL and Oracle using massive synthetic datasets, analyzing query execution times, indexing effects, and providing data generation scripts and detailed results.

Top Architect

Jun 16, 2020

Performance Evaluation of Multi-Table Joins in MySQL and Oracle with Large Datasets

The author questions the Alibaba Java Development Manual rule that forbids joining more than three tables, and designs a series of SQL experiments to verify its validity.

Problem Statement

Why does the manual prohibit joins across more than three tables? The author sets out to test this claim by constructing a realistic schema with four tables: student, teacher, course, and sc (the relationship table).

Experimental Environment

All tests are run on a VM (VMware 10) with CentOS 7.4 and MySQL 5.7.22. The machine has 4 GB RAM, 4 CPU cores, 50 GB SSD storage. MySQL is configured with a 2 GB buffer pool.

Experiment Description

The goal is to find the student with the highest score for courses taught by teacher 'tname553'. The main query joins the four tables and includes a sub‑query to obtain the maximum score.

select Student.Sname,course.cname,score 
    from Student,SC,Course ,Teacher 
    where Student.s_id=SC.s_id and SC.c_id=Course.c_id  and sc.t_id=teacher.t_id 
    and Teacher.Tname='tname553' 
    and SC.score=(select max(score) from SC where sc.t_id=teacher.t_Id);

The author decomposes this into three simpler statements to illustrate the logic.

select max(score)  from SC ,Teacher where sc.t_id=teacher.t_Id and Teacher.Tname='tname553';
select sc.t_id,sc.s_id,score   from SC ,Teacher 
where sc.t_id=teacher.t_Id and score=590  and Teacher.Tname='tname553';
select Student.Sname,course.cname,score 
from Student,SC ,course
where Student.s_id=SC.s_id and  sc.s_id  in (20769800,48525000,26280200) and course.c_id = sc.c_id;

Data Generation Scripts

To stress the system, the author generates massive synthetic data sets: 10 million enrollment records, 5 million students, 1 million teachers, and 1 000 courses for the first test; later scaling up to 100 million enrollment records, 50 million students, 10 million teachers, and the same number of courses. The MySQL scripts create tables and populate them with loops.

use stu;
create table student (
  s_id int(11) not null auto_increment,
  sno int(11),
  sname varchar(50),
  sage int(11),
  ssex varchar(8),
  father_id int(11),
  mather_id int(11),
  note varchar(500),
  primary key (s_id),
  unique key uk_sno (sno)
) engine=innodb default charset=utf8mb4;

-- function to insert 50 million students
create function insert_student_data() returns int deterministic
begin
  declare i int;
  set i=1;
  while i<50000000 do
    insert into student values(i,i,concat('name',i),i,
      if(floor(rand()*10)%2=0,'f','m'),
      floor(rand()*100000),floor(rand()*1000000),concat('note',i));
    set i=i+1;
  end while;
  return 1;
end;

Similar functions are provided for course, sc, and teacher tables, both for MySQL and Oracle (PL/SQL) environments.

Test Results on MySQL

The author presents two screenshots (omitted here) showing query times for different data volumes and indexing configurations. Key observations include:

When join keys lack indexes, queries become extremely slow, confirming the need for proper indexing in multi‑table joins.

Simplified queries (splitting the original join) remain tolerable up to 100 million rows, whereas the full four‑table join fails beyond ~15 million rows on the author's machine.

MySQL’s performance degrades noticeably with larger joins, suggesting that the “no more than three tables” rule is driven by MySQL’s limitations at high data scales.

Oracle Comparison

Running the same four‑table join on Oracle with no indexes completes in about 26 seconds for the large data set, demonstrating Oracle’s superior join handling compared to MySQL.

Conclusion

The experiment confirms that the Alibaba rule is primarily a MySQL‑centric guideline: with sufficient hardware and proper indexing, MySQL can handle joins beyond three tables, but performance quickly becomes a bottleneck at very large scales. For high‑throughput, high‑concurrency systems, it is advisable to keep SQL simple and push complex logic to the application layer.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance SQL Data Generation MySQL Oracle JOIN optimization

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.