Databases 12 min read

Why Alibaba Bans Joins Over Three Tables – A Must‑Know Rule for SQL Engineers

Alibaba’s Java Development Manual mandates that any SQL involving more than three tables must be avoided, a rule that stems from the exponential cost of multi‑table joins in a single‑instance database, prompting engineers to rethink data modeling, adopt denormalization, wide tables, materialized views, CQRS or application‑level assembly instead of relying on complex joins.

Architecture Digest
Architecture Digest
Architecture Digest
Why Alibaba Bans Joins Over Three Tables – A Must‑Know Rule for SQL Engineers

Background: a common pain point

A product manager asks for a comprehensive report that joins orders, users, coupons, logistics, reviews and refunds. The developer writes a seven‑table LEFT JOIN that runs in 0.2 s locally, but on production the CPU spikes and the connection pool is exhausted. The DBA points out Alibaba’s rule: “超过三张表,禁止JOIN”.

The “hard‑core” rule

【强制】在数据库中,超过三个表禁止使用JOIN。需要JOIN的字段,数据类型保持绝对一致;多表关联查询时,保证被关联的字段需要有索引。

This is not a style guideline like SELECT * ; it is an architectural red line.

Why three tables?

In most business scenarios a single‑instance relational database can only rescue a three‑table join with indexes, small‑table‑driven joins, or optimizer tweaks. Adding a fourth table makes the execution plan complexity grow exponentially, the optimizer often picks a wrong path, and the only remedy is more hardware, which is rarely viable.

Multi‑table joins are a design problem

When a query needs more than three tables, the data model is usually at fault.

2.1 Over‑normalization syndrome

Following strict 3NF, architects split every entity into atomic tables (order, user, address, coupon, payment, refund, …). A query like “2023 orders with coupons, unfinished refunds, and East‑China delivery” then requires six tables, turning the join into a performance nightmare.

2.2 The “universal SQL” anti‑pattern

Early in a project, developers may create a “flexible” query that dynamically concatenates WHERE clauses and joins seven or eight tables based on front‑end parameters. Business logic is pushed into the database, and a single schema change can cause the query to become 100× slower due to a bad full‑table‑scan plan.

What to do instead of a >3‑table join?

3.1 Denormalized design – sacrifice purity for performance

In many internet scenarios 3NF is unnecessary. Store product name, user nickname and address text directly in the order table. The snapshot remains correct even if the user later changes their nickname or address. Adding three columns ( product_name, user_name, shop_name) reduced a four‑table query from 1.2 s to 20 ms.

3.2 Wide tables, materialized views, and ETL

Build a wide table that contains all required fields and refresh it hourly via offline ETL.

Use a materialized view (if the DB supports it) and schedule periodic refreshes.

Adopt a column‑store OLAP engine such as ClickHouse for analytical workloads; it naturally handles wide tables and multi‑dimensional analysis.

Report queries on these structures drop from seconds to milliseconds.

3.3 Application‑level assembly

In a micro‑service architecture, fetch IDs from Service A, call Service B’s batch API for the related data, and merge the results in memory (e.g., using a Map). Benefits include high index hit rates, easy caching, parallel calls, and independent service evolution.

3.4 CQRS – separate read models

Write operations follow a normalized domain model, while read operations use a dedicated, often denormalized, read model (a wide table or even a NoSQL store). Many mid‑platform systems sync a “materialized aggregation table” from the write side, allowing single‑table reads for complex queries.

Sharding makes even two‑table joins risky

When order_id is hashed to 64 databases and user_id to another 64, a join across shards forces either massive data movement or a Cartesian product, both infeasible. Middleware such as ShardingSphere or MyCat only allow joins on the same shard key or block cross‑shard joins entirely.

Thus, after sharding, even a two‑table join is often a death sentence.

Turning the rule into a design signal

If a query needs more than three tables, the data model likely violates business boundaries.

Frequent multi‑table queries indicate a missing dedicated read model.

Growing SQL length and index complexity show business logic leaking into the database layer.

The rule is a reminder to redesign rather than keep patching a bad schema.

Checklist before writing a >3‑table join

Is real‑time association truly required? Could async sync, materialized view, or a wide table suffice?

Can redundant fields be added to collapse multiple tables into one or two?

Will the join survive a ten‑fold data growth and sharding?

If the answers are negative, the correct action is to discard the join and redesign the architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AlibabaPerformanceSQLShardingDatabase DesignJoinCQRS
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.