Databases 21 min read

In-Depth Analysis of Hash Join Implementation and Execution Flow in MySQL 8.0

The article explores MySQL's Hash Join algorithms, comparing Basic, On-disk, Grace, and Hybrid variants, while detailing rule-based optimizer decisions, memory management, disk spilling mechanisms, and outer join execution strategies through source code analysis.

Tencent Database Technology

Dec 29, 2021

In-Depth Analysis of Hash Join Implementation and Execution Flow in MySQL 8.0

This article provides a comprehensive technical analysis of Hash Join implementation in MySQL 8.0, detailing its evolution from Nested Loop Join and comparing core algorithms including Basic, On-disk, Grace, and Hybrid Hash Join. It explains how MySQL manages memory constraints through chunking and disk spilling, and outlines rule-based optimizer decisions for selecting join strategies.

The optimizer categorizes join conditions into equi-join keys for hash table construction and non-equi conditions as extra filters. For inner joins, non-equi predicates are pushed into separate Filter iterators, while outer joins retain them as extra conditions. Automatic selection in MySQL 8.0.22 relies on heuristics rather than cost models, with disk spilling controlled by upper-level operators like LIMIT or GROUP BY.

The execution flow utilizes a Volcano-style iterator and a state machine to manage Build and Probe phases. The Build phase populates the hash table until memory limits are reached, triggering disk partitioning if spilling is allowed. The Probe phase matches rows and employs a saving file mechanism to efficiently handle outer joins and multi-pass scenarios without full rescans.

Source code inspection highlights the internal logic for hash table construction and state transitions:

int HashJoinIterator::Read() {
for (;;) {
  switch (m_state) {
     case State::READING_ROW_FROM_PROBE_ITERATOR: ReadRowFromProbeIterator ②
     case State::READING_FIRST_ROW_FROM_HASH_TABLE:
     case State::READING_FROM_HASH_TABLE:   ReadNextJoinedRowFromHashTable(); ③
     case State::LOADING_NEXT_CHUNK_PAIR:  ReadNextHashJoinChunk()  ④
     case State::READING_ROW_FROM_PROBE_CHUNK_FILE: ReadRowFromProbeChunkFile() ⑤
     case State::READING_ROW_FROM_PROBE_ROW_SAVING_FILE: ReadRowFromProbeRowSavingFile()  ⑥
}

The implementation carefully manages buffer reuse and memory allocation to minimize copying overhead, with notable optimizations introduced in version 8.0.23. Outer join semantics are preserved by tracking matched rows and applying null flags appropriately during the probe phase.

In conclusion, MySQL's Hash Join has matured into a robust execution engine that significantly outperforms Nested Loop Join for large datasets and complex join conditions. While current implementations show strong foundational design, future enhancements in storage management, partition probing, and predicate pushdown will further elevate its efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

source-code-analysis query optimization Performance tuning MySQL Hash Join Join Algorithms Relational Databases

Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.