Understanding Semi‑join Materialization in MySQL Subquery Optimization
This article explains how MySQL transforms eligible subqueries into semi‑joins, describes the four semi‑join strategies, and details the Semi‑join Materialization approach—including Materialization‑scan and Materialization‑lookup—while showing how to identify the chosen strategy via EXPLAIN output.
This is the second article in a series on MySQL subquery optimization, focusing on the concrete implementation of Semi‑join strategies.
What is a Semi‑join
In regular joins duplicate rows may appear, while a subquery can return a distinct set. For example, joining country and city with a population filter produces duplicate china rows, whereas using IN (SELECT ...) returns a single row.
select country.* from country join city on country.code=city.country_code \
and population>20000000; select * from country where code in \
(select country_code from city where population>20000000);The optimizer can recognise that the IN clause needs only one value per group and replace the subquery with a Semi‑join, improving efficiency.
Semi‑join Constraints
The subquery must appear in a top‑level WHERE or ON clause as IN or =ANY .
It must be a single SELECT , not a UNION .
No GROUP BY or HAVING (except with materialization strategy).
No implicit grouping (no aggregate functions).
No ORDER BY … LIMIT .
The outer query must not use STRAIGHT_JOIN .
The total number of tables must be below the join‑table limit.
Semi‑join Implementation Strategies
Duplicate Weedout
FirstMatch
LooseScan
Materialize
These correspond to the optimizer_switch flags semijoin=ON and the individual strategy flags, all enabled by default.
EXPLAIN output shows which strategy is used: Extra containing Start temporary / End temporary indicates Duplicate Weedout; FirstMatch(tbl_name) indicates FirstMatch; LooseScan(m..n) indicates LooseScan; and a select_type of MATERIALIZED with a temporary table name signals the Materialize strategy.
Semi‑join Materialization
This strategy materialises the subquery result into a temporary table and then joins it. It has two variants:
Materialization‑scan
Materialization‑lookup
Example query (non‑correlated subquery) that finds European countries with cities having population > 7 million:
select * from Country
where Country.code IN (select City.Country
from City
where City.Population > 7*1000*1000)
and Country.continent='Europe';With Materialization‑scan the optimizer first builds the temporary table, then scans it fully when joining to Country . The EXPLAIN output shows two rows, the second with select_type=MATERIALIZED and the temporary table referenced as <subquery2> . The total rows scanned are 15 (subquery) + 15 (temporary) + 15 × 1 = 45.
Changing the subquery to return more rows switches to Materialization‑lookup, where the outer table drives the join and the temporary table is accessed via its primary key. The EXPLAIN output reflects this with eq_ref on the temporary table. The rows scanned become 238 (subquery) + 239 × 1 = 477.
Notes
In MariaDB, Semi‑join Materialization can still be used when the subquery contains GROUP BY , but in MySQL any GROUP BY disables all Semi‑join strategies, falling back to the generic Materialization optimization controlled by materialization=ON .
References
https://mariadb.com/kb/en/semi-join-materialization-strategy/
https://dev.mysql.com/doc/refman/5.7/en/semijoins.html
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.