Databases 11 min read

Evaluating the ‘No Join Over Three Tables’ Rule from Alibaba Java Development Manual with MySQL and Oracle Experiments

This article investigates why the Alibaba Java Development Manual advises against joining more than three tables by designing and executing large‑scale MySQL and Oracle experiments, analyzing query performance, indexing effects, and data‑generation scripts to determine the practical limits of multi‑table joins.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Evaluating the ‘No Join Over Three Tables’ Rule from Alibaba Java Development Manual with MySQL and Oracle Experiments

The author questions the Alibaba Java Development Manual’s rule that joins involving more than three tables should be avoided, and designs a series of experiments to verify this claim using MySQL 5.7.22 on a VMware‑CentOS 7.4 environment.

Four tables (student, teacher, course, and sc) are created, and a query is written to find the student with the highest score for courses taught by a specific teacher. The original SQL query and its decomposition into three simpler statements are shown:

select Student.Sname,course.cname,score
from Student,SC,Course ,Teacher
where Student.s_id=SC.s_id and SC.c_id=Course.c_id and sc.t_id=teacher.t_id
  and Teacher.Tname='tname553'
  and SC.score=(select max(score) from SC where sc.t_id=teacher.t_Id);

Data generation scripts are provided to create massive datasets (up to 50 million students, 10 million teachers, 1 billion sc records) in MySQL, with and without indexes, to test query performance at different scales. Sample MySQL data‑generation functions are included:

create function insert_student_data() returns int deterministic
begin
  declare i int;
  set i=1;
  while i<50000000 do
    insert into student values(i,i,concat('name',i),i,
      case when floor(rand()*10)%2=0 then 'f' else 'm' end,
      floor(rand()*100000),floor(rand()*1000000),concat('note',i));
    set i=i+1;
  end while;
  return 1;
end;

Similar scripts generate data for the course, sc, and teacher tables, and indexes are added on join keys (s_id, t_id, c_id). The experiments compare query times for 1 million, 10 million, and 100 million rows, showing that lack of indexes dramatically slows multi‑table joins, while indexed joins remain acceptable up to tens of millions of rows.

Oracle experiments are also performed, creating equivalent tablespaces, users, and data‑generation procedures. The Oracle results demonstrate that, even without indexes, Oracle can complete the same four‑table join in about 26 seconds for large datasets, highlighting MySQL’s relative weakness in handling complex joins.

From the results, the author concludes that the “no join over three tables” rule is primarily a MySQL‑specific performance guideline; with proper indexing and reasonable data volumes, joins of more than three tables are feasible, and complex logic can be safely placed in the application layer when needed.

SQLData GenerationMySQLDatabase OptimizationOracleJoin Performance
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.