Understanding B+ Tree, Hash, and Full‑Text Indexes in MySQL
This article explains the principles, structures, and operations of MySQL indexes, covering B+ tree indexes, their search, insertion, and deletion mechanisms, as well as hash indexes, adaptive hash indexing, and full‑text indexes with inverted indexing, cache handling, and practical limitations.
In a database, having too many indexes can degrade write performance, while too few indexes hurt query speed; therefore a balance is needed.
B+ Tree Index
The "B" in B+ tree stands for "balanced" rather than "binary". It is a balanced search tree designed for disk storage, with all leaf nodes on the same level linked by pointers.
Because of its high fan‑out, a typical B+ tree height is 2–4, meaning a key lookup requires only 2–4 I/O operations.
MySQL uses B+ trees for both clustered (primary) and secondary indexes; the difference is whether leaf pages store the full row (clustered) or just a bookmark to the clustered key (secondary). Each table can have only one clustered index.
Leaf pages are linked by a doubly‑linked list, preserving the logical order of rows.
Index Search
B+ tree lookup uses binary (half‑interval) search: repeatedly compare the target key with the middle key of a sorted list and narrow the search range by half.
Index Insertion
Inserting a key may require leaf page splits and possibly index page splits, which involve disk I/O. Examples:
Insert 28 – leaf page has space, insert directly.
Insert 70 – leaf page full, index page has space; split leaf at 60 and promote 60 to index.
Insert 95 – both leaf and index pages full; two splits are needed.
Index Deletion
Deletion is controlled by a fill factor (minimum 50%). After removal, under‑filled pages may be merged to keep the tree balanced.
Index Classification
Indexes can be categorized as:
Clustered index (primary key)
Secondary (auxiliary) index
Composite (multi‑column) index
Covering index (all columns needed by a query are stored in the index)
Hash Index
Hash algorithms have constant time complexity O(1) . InnoDB uses a hash table for dictionary lookups, handling collisions with chaining and employing a division‑based hash function.
Example: with a 10 MB buffer pool (640 pages), InnoDB allocates 1 280 slots, rounded up to the next prime (1399) for the hash table.
Adaptive Hash Index
Adaptive hash index builds a hash table on frequently accessed B+ tree pages, providing fast equality lookups but not supporting range queries.
Full‑Text Index
For pattern searches like SELECT * FROM blog WHERE content LIKE '%word%'; , B+ trees degrade to full table scans. Full‑text search uses an inverted index to map words to document IDs and positions.
Inverted Index Types
Two forms are used:
Inverted file index: {word → document IDs}
Full inverted index: {word → (document ID, position)}
Example tables illustrate how words are associated with document IDs and positions.
Full‑Text Index Cache
InnoDB maintains a red‑black‑tree cache (FTS Index Cache) for pending full‑text index updates. The cache size is controlled by innodb_ftcache_size (default 32 MB). On crash, unwritten cache entries are recovered.
Limitations of Full‑Text Index
Current constraints include:
Supported only for MyISAM and InnoDB engines
No support for partitioned tables
All indexed columns must share the same character set and collation
Languages without word delimiters (e.g., Chinese) require n‑gram tokenization
Search expressions must use constant strings; wildcards like % are not allowed
Full‑text operations are performed at transaction commit time
References
MySQL Technical Internals – InnoDB Storage Engine, 2nd Edition.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.