Understanding MySQL 8.0.18 Hash Join: Features, Usage, and Performance Comparison
The article explains MySQL 8.0.18's new hash join capability, shows how to enable and view it with EXPLAIN FORMAT=TREE, provides sample SQL and table definitions, compares its performance against block nested‑loop joins under various conditions, and discusses configuration limits and tuning tips.
MySQL 8.0.18 GA introduced hash join support for inner‑join queries, allowing the optimizer to choose a hash‑based algorithm when equality conditions are present, even without indexes.
Example of a simple hash join query:
SELECT * FROM t1 JOIN t2 ON t1.c1 = t2.c1;
Three test tables can be created with:
CREATE TABLE t1 (c1 INT, c2 INT); CREATE TABLE t2 (c1 INT, c2 INT); CREATE TABLE t3 (c1 INT, c2 INT);
Running EXPLAIN FORMAT=TREE on the query shows an "Inner hash join" node, confirming that the optimizer selected the hash join algorithm.
Hash join works for multi‑table equality joins as well, e.g.:
SELECT * FROM t1 JOIN t2 ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2) JOIN t3 ON t2.c1 = t3.c1;
If a join condition lacks an equality predicate, MySQL falls back to the traditional block nested‑loop algorithm.
Hash join can also be used for Cartesian products when no explicit join condition is given.
Control of hash join usage is possible via the global or session variable optimizer_switch (hash_join=on/off) or by using optimizer hints HASH_JOIN and NO_HASH_JOIN in statements.
The memory consumption of hash join is limited by join_buffer_size ; if the required memory exceeds this limit, MySQL spills to disk. Additionally, the number of open files must not exceed open_files_limit , otherwise the join may fail.
Performance tests were conducted by inserting one million rows into each table (without indexes) and measuring execution time:
• Hash join: ~12.98 s for a three‑table COUNT(*) query. • Block nested‑loop (forced with NO_HASH_JOIN ): query did not finish after many minutes, CPU at 100 %.
When indexes on the join columns were added, the block nested‑loop algorithm performed much better (≈19.56 s), while hash join remained faster when applicable.
Cross‑database comparisons showed similar advantages for hash join: Oracle 12c (1.28 s), PostgreSQL 11.5 (6.23 s), and SQL Server 2017 (5.21 s) for comparable unindexed joins.
In summary, MySQL’s hash join provides a significant speed boost for equality joins without indexes, is enabled by default, and can be tuned via optimizer switches, hints, and buffer size settings.
Laravel Tech Community
Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.