Databases 19 min read

Understanding B‑Tree and B+Tree Indexes in MySQL

This article explains the fundamentals of B‑Tree and B+Tree data structures, their search algorithms, and how MySQL's MyISAM and InnoDB storage engines implement these indexes to achieve efficient disk‑based query performance.

政采云技术

Jan 2, 2024

Understanding B‑Tree and B+Tree Indexes in MySQL

Data Structures and Algorithms

Indexes are essentially data structures that enable fast data retrieval; database designers optimize query speed by using advanced search algorithms instead of linear search, which has O(n) complexity.

1.1 B‑Tree

A record is defined as a tuple [key, data] with unique keys. A B‑Tree of degree d satisfies several conditions: each non‑leaf node contains d ≤ n ≤ 2d keys, leaf nodes contain between 1 and 2d‑1 keys, all leaves share the same depth h, keys are stored in non‑decreasing order, and pointers either point to child nodes or are null. The tree structure ensures that each pointer’s key range is well defined.

d is a positive integer greater than 1, called the degree of the B‑Tree.

h is a positive integer, the height of the B‑Tree.

Each non‑leaf node has n‑1 keys and n pointers, where d ≤ n ≤ 2d.

Each leaf node contains at least one key and two pointers, at most 2d‑1 keys and 2d pointers; leaf pointers are null.

All leaf nodes have the same depth h.

Keys and pointers alternate, with pointers at both ends of a node.

Keys within a node are ordered non‑decreasingly from left to right.

All nodes together form a tree.

Each pointer is either null or points to another node.

If the leftmost pointer of a node is non‑null, all keys in the pointed node are less than the node’s first key.

If the rightmost pointer of a node is non‑null, all keys in the pointed node are greater than the node’s last key.

If a pointer lies between two adjacent keys key_i and key_{i+1}, the pointed node’s keys lie between those two values.

An example B‑Tree with d = 2 is shown, followed by the standard search algorithm:

BTree_Search(node, key) {
    if (node == null) return null;
    foreach (node.key) {
        if (node.key[i] == key) return node.data[i];
        if (node.key[i] > key) return BTree_Search(point[i]->node);
    }
    return BTree_Search(point[i+1]->node);
}

data = BTree_Search(root, my_key);

The height h of a B‑Tree with N keys satisfies h ≤ log_d((N+1)/2), giving a search complexity of O(log_d N).

1.2 B+Tree

B+Tree is the most common B‑Tree variant used by MySQL. Differences from B‑Tree include: (1) each node’s pointer limit is 2d instead of 2d+1; (2) internal nodes store only keys, while leaf nodes store data and no pointers.

A simple B+Tree illustration is provided; leaf and internal nodes may have different sizes, and B+Tree is better suited for external‑storage indexes.

1.3 B+Tree with Sequential Access Pointers

Adding a pointer between adjacent leaf nodes creates a linked list of leaves, enabling efficient range queries by sequentially traversing the leaves after locating the start key.

1.4 Why Use B‑Tree / B+Tree

Indexes are usually stored on disk, so minimizing disk I/O is critical. By sizing a node to a disk page, each node can be loaded with a single I/O. B‑Tree’s high fan‑out (often >100) yields a very small height (usually ≤3), making it far more I/O‑efficient than structures like red‑black trees.

Memory Access Principles

RAM is modeled as a matrix of addressable cells; access time grows linearly with the number of accesses because there is no mechanical movement.

Disk Access Principles

Disks consist of rotating platters and movable heads. Accessing a sector requires seek time (moving the head) and rotational latency (waiting for the sector to rotate under the head).

Locality Principle and Disk Prefetch

Because disk I/O is slow, disks prefetch sequential blocks, exploiting spatial locality: data near the requested byte is likely to be needed soon, reducing overall I/O.

Performance Analysis of B‑/+Tree Indexes

Searching a B‑Tree touches at most h nodes; with page‑sized nodes, each node incurs one I/O. Typical fan‑out >100 yields h ≤ 3, so the I/O cost is minimal. B+Tree’s larger fan‑out (due to omission of data in internal nodes) further improves performance.

MySQL Implementation

2.1 MyISAM Index

MyISAM uses B+Tree where leaf nodes store the physical address of the data record. Primary and secondary indexes share the same structure; primary keys are unique, secondary keys may repeat.

2.2 InnoDB Index

InnoDB also uses B+Tree, but the table’s data file itself is a clustered B+Tree: leaf nodes contain full records, and the primary key serves as the index key. Secondary indexes store the primary key value, requiring a second lookup to retrieve the full record.

Conclusion

The article focuses on B‑Tree based indexes in MySQL, describing their structure, the differences between MyISAM and InnoDB implementations, and the performance advantages of B‑Tree and B+Tree for disk‑based indexing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Data Structures Algorithms B+Tree database indexing B-Tree

Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.