Understanding B‑Tree and B+Tree Indexes in MySQL
This article explains the fundamentals of B‑Tree and B+Tree data structures, their search algorithms, and how MySQL's MyISAM and InnoDB storage engines implement these indexes to achieve efficient disk‑based query performance.
Data Structures and Algorithms
Indexes are essentially data structures that enable fast data retrieval; database designers optimize query speed by using advanced search algorithms instead of linear search, which has O(n) complexity.
1.1 B‑Tree
A record is defined as a tuple [key, data] with unique keys. A B‑Tree of degree d satisfies several conditions: each non‑leaf node contains d ≤ n ≤ 2d keys, leaf nodes contain between 1 and 2d‑1 keys, all leaves share the same depth h , keys are stored in non‑decreasing order, and pointers either point to child nodes or are null. The tree structure ensures that each pointer’s key range is well defined.
d is a positive integer greater than 1, called the degree of the B‑Tree.
h is a positive integer, the height of the B‑Tree.
Each non‑leaf node has n‑1 keys and n pointers, where d ≤ n ≤ 2d .
Each leaf node contains at least one key and two pointers, at most 2d‑1 keys and 2d pointers; leaf pointers are null.
All leaf nodes have the same depth h .
Keys and pointers alternate, with pointers at both ends of a node.
Keys within a node are ordered non‑decreasingly from left to right.
All nodes together form a tree.
Each pointer is either null or points to another node.
If the leftmost pointer of a node is non‑null, all keys in the pointed node are less than the node’s first key.
If the rightmost pointer of a node is non‑null, all keys in the pointed node are greater than the node’s last key.
If a pointer lies between two adjacent keys key_i and key_{i+1} , the pointed node’s keys lie between those two values.
An example B‑Tree with d = 2 is shown, followed by the standard search algorithm:
BTree_Search(node, key) {
if (node == null) return null;
foreach (node.key) {
if (node.key[i] == key) return node.data[i];
if (node.key[i] > key) return BTree_Search(point[i]->node);
}
return BTree_Search(point[i+1]->node);
}
data = BTree_Search(root, my_key);The height h of a B‑Tree with N keys satisfies h ≤ log_d((N+1)/2) , giving a search complexity of O(log_d N) .
1.2 B+Tree
B+Tree is the most common B‑Tree variant used by MySQL. Differences from B‑Tree include: (1) each node’s pointer limit is 2d instead of 2d+1 ; (2) internal nodes store only keys, while leaf nodes store data and no pointers.
A simple B+Tree illustration is provided; leaf and internal nodes may have different sizes, and B+Tree is better suited for external‑storage indexes.
1.3 B+Tree with Sequential Access Pointers
Adding a pointer between adjacent leaf nodes creates a linked list of leaves, enabling efficient range queries by sequentially traversing the leaves after locating the start key.
1.4 Why Use B‑Tree / B+Tree
Indexes are usually stored on disk, so minimizing disk I/O is critical. By sizing a node to a disk page, each node can be loaded with a single I/O. B‑Tree’s high fan‑out (often >100) yields a very small height (usually ≤3), making it far more I/O‑efficient than structures like red‑black trees.
Memory Access Principles
RAM is modeled as a matrix of addressable cells; access time grows linearly with the number of accesses because there is no mechanical movement.
Disk Access Principles
Disks consist of rotating platters and movable heads. Accessing a sector requires seek time (moving the head) and rotational latency (waiting for the sector to rotate under the head).
Locality Principle and Disk Prefetch
Because disk I/O is slow, disks prefetch sequential blocks, exploiting spatial locality: data near the requested byte is likely to be needed soon, reducing overall I/O.
Performance Analysis of B‑/+Tree Indexes
Searching a B‑Tree touches at most h nodes; with page‑sized nodes, each node incurs one I/O. Typical fan‑out >100 yields h ≤ 3 , so the I/O cost is minimal. B+Tree’s larger fan‑out (due to omission of data in internal nodes) further improves performance.
MySQL Implementation
2.1 MyISAM Index
MyISAM uses B+Tree where leaf nodes store the physical address of the data record. Primary and secondary indexes share the same structure; primary keys are unique, secondary keys may repeat.
2.2 InnoDB Index
InnoDB also uses B+Tree, but the table’s data file itself is a clustered B+Tree: leaf nodes contain full records, and the primary key serves as the index key. Secondary indexes store the primary key value, requiring a second lookup to retrieve the full record.
Conclusion
The article focuses on B‑Tree based indexes in MySQL, describing their structure, the differences between MyISAM and InnoDB implementations, and the performance advantages of B‑Tree and B+Tree for disk‑based indexing.
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.