Understanding MySQL Index Structures, Execution Plans, Transaction Isolation Levels, MVCC, and Buffer Pool Mechanisms
This article provides a comprehensive technical guide on MySQL internals, covering the underlying data structures of indexes (B‑tree, B+‑tree, hash), how execution plans are generated and interpreted, the four transaction isolation levels, MVCC implementation, and the InnoDB buffer‑pool architecture.
1. MySQL Index Data Structures and Algorithms
Indexes are essential for MySQL to retrieve data efficiently and in sorted order. MySQL mainly uses B+ trees and hash structures; hash is fast for direct lookups but unsuitable for ordering.
Tree‑based indexes rely on binary search, offering low time complexity, but can suffer from tree degeneration and increased disk I/O as the tree height grows.
Binary Tree : In the worst case it degrades to a linked list, increasing query time.
Red‑Black Tree : A balanced binary tree that mitigates degeneration but still faces height growth.
B‑Tree : A multi‑way tree that stores more data per node, reducing height but each node stores full row data, leading to larger memory usage.
B+‑Tree : An evolution of B‑Tree where only leaf nodes store row data and internal nodes contain only keys; leaf nodes are linked (MySQL uses a double‑linked list) to support efficient range scans.
In MySQL, a single index node defaults to 16 KB, which corresponds to a disk block, allowing many index entries per block and benefiting from OS pre‑fetching.
2. SQL Execution Plan Analysis
Execution plans are crucial for SQL tuning. Using EXPLAIN you can view columns such as:
id : execution order of SELECT statements.
table : the table being accessed.
type : optimization level (system > const > eq_ref > ref > range > index > ALL).
key : the actual index used.
rows : estimated rows examined.
Extra : additional info (e.g., Using index, Using where, Using temporary, Using filesort).
3. How a SQL Statement Executes in MySQL
Client maintains a long‑lived TCP connection to the MySQL server.
MySQL checks its query cache (LRU eviction) for a cached plan.
The SQL is parsed by a C‑language parser.
The optimizer generates an execution plan, possibly applying FORCE_INDEX .
The executor invokes the appropriate storage engine (InnoDB, MyISAM, Memory, etc.).
The execution engine uses the chosen index to locate rows and updates the index as needed.
4. MySQL Locks and Transaction Isolation Levels
Concurrent transactions can cause dirty writes, dirty reads, non‑repeatable reads, and phantom reads. MySQL solves these with transaction isolation levels and locking mechanisms.
ACID properties of a transaction are:
Atomicity
Consistency
Isolation
Durability
The four isolation levels are:
Read Uncommitted : can see uncommitted changes.
Read Committed : sees only committed data.
Repeatable Read : same query returns the same rows within a transaction.
Serializable : highest isolation, effectively locking rows.
Lock types include optimistic vs. pessimistic, read (shared) vs. write (exclusive), and table‑level vs. row‑level locks. InnoDB supports row‑level locks and transactions, while MyISAM does not.
Example commands:
show variables like '%isolation%'; set transaction_isolation='REPEATABLE-READ';5. Deep Dive into MVCC (Multi‑Version Concurrency Control)
MVCC allows reads without locking and provides high concurrency for read‑heavy workloads. It works via a ReadView and an undo_log version chain .
Each row stores hidden columns trx_id and roll_pointer that link to previous versions. The version chain is a singly linked list where the tail holds the latest version.
ReadView captures the list of active (uncommitted) transaction IDs at the start of a read. In Read‑Committed isolation a new ReadView is created for each query; in Repeatable‑Read the same ReadView is reused throughout the transaction, preventing phantom reads.
6. InnoDB Buffer Pool Mechanism
MySQL does not write directly to disk for each request; instead it uses a Buffer Pool to cache pages in memory, reducing random I/O.
Query loads the required page from disk into the Buffer Pool.
Before modification, the original row is written to the undo log.
The Buffer Pool page is updated.
Changes are recorded in the redo‑log buffer.
On transaction commit, the redo‑log buffer is flushed to the redo log.
In parallel, a background thread writes the binary log.
The redo log entry is marked as committed to guarantee durability.
This design enables high‑throughput reads and writes while ensuring data consistency.
--- For further reading, the original article includes many diagrams and screenshots illustrating the concepts.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.