Understanding How MySQL Converts Uncorrelated Subqueries to Correlated Subqueries and Their Execution Plans
This article explains the process by which MySQL transforms uncorrelated IN subqueries into correlated subqueries, details the differences in EXPLAIN output columns, explores optimizer trace information, compares materialization versus correlated execution costs, and provides best‑practice recommendations for writing efficient queries.
The article begins by introducing the conversion of uncorrelated subqueries (specifically IN subqueries) into correlated subqueries in MySQL 8.0.29, using the Sakila sample database for demonstration.
1. Explain type and ref column display logic
Two example queries are shown. The first is a hand‑crafted correlated subquery, and the second is an uncorrelated IN subquery that MySQL converts. The EXPLAIN output for each query is displayed, highlighting the type and ref columns.
-- Disable semi‑join optimization
SET optimizer_switch="semijoin=off";
-- Hand‑crafted correlated subquery
EXPLAIN SELECT * FROM city
WHERE city_id < 100 AND EXISTS (
SELECT city_id FROM address
WHERE city.city_id = address.city_id
);
-- Uncorrelated IN subquery (will be converted)
EXPLAIN SELECT * FROM city
WHERE city_id < 100 AND city_id IN (
SELECT city_id FROM address
);The type column shows ref for the correlated query and index_subquery for the converted query, while the ref column shows the actual field or func when the condition is generated by the conversion.
2. Optimizer trace
Enabling optimizer_trace reveals internal decision‑making during query preparation, optimization, and execution. The trace can be queried from information_schema.OPTIMIZER_TRACE . Important fields such as QUERY , TRACE , MISSING_BYTES_BEYOND_MAX_MEM_SIZE , and INSUFFICIENT_PRIVILEGES are described.
SET optimizer_trace = "enabled=on";
SELECT * FROM information_schema.OPTIMIZER_TRACE;System variables optimizer_trace_offset and optimizer_trace_limit control how many recent statements are kept.
3. IN subquery conversion strategies
MySQL can choose among three strategies: convert to a semi‑join, materialize the subquery, or convert to a correlated subquery. When a semi‑join is not possible, the optimizer compares the cost of materialization versus correlated execution using the trace.
{
"execution_plan_for_potential_materialization": {
"subq_mat_decision": {
"parent_fanouts": [{
"select#": 1,
"subq_attached_to_table": true,
"table": "`city`",
"fanout": 99,
"cacheable": true
}],
"cost_to_create_and_fill_materialized_table": 123.849,
"cost_of_one_EXISTS": 0.349669,
"number_of_subquery_evaluations": 99,
"cost_of_materialization": 133.749,
"cost_of_EXISTS": 34.6172,
"chosen": false
}
}
}The trace shows that materialization is more expensive (cost ≈ 133.75) than executing the subquery as a correlated EXISTS (cost ≈ 34.62), so the optimizer selects the correlated approach.
Cost calculations are explained: cost_of_materialization = cost_to_create_and_fill_materialized_table + number_of_subquery_evaluations × 0.1 (where 0.1 is memory_temptable_row_cost ), and cost_of_EXISTS = cost_of_one_EXISTS × number_of_subquery_evaluations .
SELECT cost_name, cost_value, default_value
FROM mysql.server_cost
WHERE cost_name = 'memory_temptable_row_cost';4. Execution flow of a correlated subquery
The main query reads rows from city , checks the non‑subquery predicates first, and for each qualifying row executes the correlated subquery (e.g., SELECT city_id FROM address WHERE address.city_id = 8 ). This process repeats for all qualifying rows.
SELECT * FROM city
WHERE city_id < 100 AND EXISTS (
SELECT city_id FROM address
WHERE city.city_id = address.city_id
);The optimizer trace confirms that the subquery is executed once per qualifying outer row (99 times in the example).
5. Best practices
Place predicates that do not involve subqueries before those that do, so MySQL can filter rows early and avoid unnecessary subquery executions.
6. Summary
After conversion, type shows unique_subquery (alias for eq_ref ) or index_subquery (alias for ref / ref_or_null ).
The ref column may display func because the generated condition references the outer query field indirectly.
If a semi‑join is impossible, MySQL chooses between materialization and correlated execution based on estimated cost, visible via optimizer trace.
The article also outlines the execution steps of a correlated subquery and recommends ordering WHERE clauses for performance.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.