MySQL Execution Process and SemiJoin Optimization Techniques
This article explains MySQL's execution stages, defines SemiJoin, details how IN subqueries can be transformed into SemiJoin queries, outlines conversion constraints, provides example transformations, and describes the various execution strategies MySQL may use for SemiJoins.
MySQL's execution flow consists of several sub‑stages: syntax analysis, semantic checking, logical optimization, physical optimization, and execution. Logical optimization (Rule‑Based Optimization, RBO) performs rule‑based logical rewrites, while physical optimization (Cost‑Based Optimization, CBO) estimates costs of execution plans and chooses the cheapest one.
SemiJoin is an operation on two tables R and S, denoted R ⋉ S, which returns the subset of rows from R that have matching rows in S. For example, the semi‑join of an employee table with a department table yields the employees whose department exists in the department table.
MySQL can convert an IN (or ANY ) subquery into a SemiJoin using a generic pattern. The generic conversion is:
SELECT ...
FROM ot1 ... otN
WHERE (oe1, ... oeM) IN (
SELECT ie1, ..., ieM
FROM it1 ... itK
[WHERE inner‑cond]
)
[AND outer‑cond]
[GROUP BY ...] [HAVING ...] [ORDER BY ...]which becomes:
SELECT ...
FROM (ot1 ... otN) SJ (it1 ... itK)
ON (oe1, ... oeM) = (ie1, ..., ieM)
[AND inner‑cond]
[WHERE outer‑cond]
[GROUP BY ...] [HAVING ...] [ORDER BY ...]Conversion is allowed only when the subquery meets certain conditions: it must be an IN or ANY subquery, must not contain UNION , must not have GROUP BY , HAVING , aggregate functions, or LIMIT . If the subquery contains an ORDER BY , the clause is removed during conversion. The outer query should not use STRAIGHT_JOIN (although tests show it may still work).
Examples:
Nested IN subqueries can be flattened into a single‑layer SemiJoin.
Parallel IN conditions combined with AND can also be transformed.
Parallel IN conditions combined with OR cannot be converted.
Multi‑column IN predicates are supported by SemiJoin.
MySQL may execute a SemiJoin using one of several strategies:
Duplicate Weedout : treat the SemiJoin as an inner join, then remove duplicate rows with a temporary table.
FirstMatch : stop scanning the inner table after the first matching row is found, mirroring the semantics of IN .
LooseScan : group duplicate rows in the inner side and return only the first match per group, often using an index.
Materialize : materialize the inner table into a temporary indexed table; the outer table then joins this materialized table, using either a lookup or scan approach.
Reference material includes the MariaDB optimization strategies documentation and the MySQL manual page on SemiJoins.
Tencent Database Technology
Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.