Databases 12 min read

Master MySQL Indexes: From Basics to B+Tree Optimization

This article explains what MySQL indexes are, how they work, their advantages and drawbacks, the different index types—including primary, ordinary, composite, full‑text, clustered and non‑clustered—and compares B‑Tree with B+Tree structures to help you design faster, more efficient queries.

Architect's Must-Have
Architect's Must-Have
Architect's Must-Have
Master MySQL Indexes: From Basics to B+Tree Optimization

1. What Is an Index

Official definition: a data structure that improves MySQL query efficiency by allowing fast retrieval of rows, similar to a book's table of contents.

1.1 How Indexes Work

Without an index, a query like

SELECT * FROM user WHERE id = 40

requires a full table scan. With an index, MySQL can perform a binary search on the index to locate the row quickly.

索引的优点:
1.大大加快数据的查询速度。
索引的缺点:
1.维护索引需要耗费数据库资源。
2.索引需要占用磁盘空间
3.当对表的数据增删改的时候,因为要维护索引,速度会受到影响。

Despite the drawbacks, the speed gain for large datasets makes indexes essential, as highlighted by Alibaba's P3C development guidelines.

2. Types of Indexes

1. Primary key index – automatically created for the primary key; InnoDB uses a clustered index.
2. Ordinary index – built on regular columns without restrictions, used to speed up queries.
3. Composite index – built on multiple columns; none of the columns may contain NULL.
4. Full‑text index (MySQL 5.7 and earlier, provided by MyISAM) – indexes large text columns for keyword search.

3. B+Tree

Consider the SQL statement

INSERT ...

shown in the following image.

After execution, MySQL automatically orders the data, which may seem surprising.

MySQL stores rows in pages (default 16 KB). Each page contains a pointer to the next, forming a linked‑list structure similar to a linked list.

每次插入数据的时候,mysql会给我们自己排序好,因为这样可以快速的查询数据。并且会通过P的指针链接到下一条数据。这里看起来是不是像某种数据结构?链表的数据结构,对了,就是这样。

Using the page directory, MySQL can locate a row with a single I/O operation, dramatically speeding up lookups.

假设每条记录占用36字节,16KB页可以容纳约455条记录;页目录可容纳2048条ID,三层结构可支持约1.9亿条记录。

3. Comparing B‑Tree and B+Tree

B+Tree stores data only in leaf nodes, reducing tree depth and I/O compared to B‑Tree, where every node holds data.

B+Tree is an optimization of B‑Tree designed for disk‑based index structures; InnoDB uses B+Tree for its indexes.

We Summarize Differences Between B+Tree and B‑Tree

1. B+Tree leaf nodes store only key values.
2. Every leaf node has a pointer to the next leaf.
3. Data resides only in leaf nodes; B‑Tree stores data in all nodes.
InnoDB page size is 16KB; primary keys are typically INT (4 bytes) or BIGINT (8 bytes).

MySQL recommends auto‑incrementing IDs to avoid page splits and maintain insertion performance.

Inserting records in primary‑key order prevents page splits. Inserting a key that falls between existing keys forces a split, while inserting a larger key simply adds a new page.

3. Clustered vs Non‑Clustered Indexes

Clustered index: data and index are stored together; leaf nodes contain the full row.

Non‑clustered (auxiliary) index: index leaf nodes store pointers to the row data.

In InnoDB, the table file itself is a B+Tree; the clustered index is built on the primary key, and its leaf nodes hold the actual rows. Each table can have only one clustered index.

In daily work, the indexes we add are usually auxiliary (non‑clustered) indexes that first locate the primary key and then fetch the row (a “back‑table” lookup).

Advantages of clustered indexes:

Faster data access because the index and data share the same B+Tree.

Efficient for range queries on the primary key.

Disadvantages:

Insert speed depends on insertion order; non‑sequential inserts cause page splits.

Updating the primary key is costly because rows must be moved.

Secondary index lookups require two steps: find the primary key, then fetch the row.

Auxiliary Index (Non‑Clustered Index)

Auxiliary indexes built on top of the clustered index store the primary key value in their leaf nodes; accessing data via an auxiliary index always involves a second lookup using that primary key.

总之,其实说白了也就是,我们平常定义的索引就是辅助索引,平常通过普通索引查询数据时,先通过辅助索引查询到主键索引,再通过主键索引查询到具体的数据。

-- The article may not be exhaustive; contributions are welcome.

PerformancedatabaseInnoDBMySQLindexB+ TreeClustered Index
Architect's Must-Have
Written by

Architect's Must-Have

Professional architects sharing high‑quality architecture insights. Covers high‑availability, high‑performance, high‑stability designs, big data, machine learning, Java, system, distributed and AI architectures, plus internet‑driven architectural adjustments and large‑scale practice. Open to idea‑driven, sharing architects for exchange and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.