Understanding MySQL Indexes: Types, B+Tree Structure, and Clustered vs. Non‑Clustered Indexes
This article explains MySQL indexes, their purpose and working principle, compares primary, ordinary, composite and full‑text indexes, describes the B+Tree storage structure versus B‑Tree, and clarifies the differences between clustered and non‑clustered (auxiliary) indexes along with their advantages and drawbacks.
MySQL indexes are data structures that improve query efficiency by allowing fast row retrieval, similar to a book's table of contents. Without an index, a query like SELECT * FROM user WHERE id = 40 requires a full table scan; with an index, the engine can perform a binary search on the index and directly locate the row.
Advantages of indexes include significantly faster query speed, while disadvantages involve additional storage space, maintenance overhead, and slower write operations due to index updates.
Indexes are classified into several types:
1. Primary key index – automatically created for the primary key (InnoDB uses a clustered index).
2. Ordinary index – built on regular columns without restrictions, used to speed up queries.
3. Composite index – built on multiple columns; none of the indexed columns may contain NULL.
4. Full‑text index (MySQL 5.7 and earlier, MyISAM) – used for searching keywords in large text fields.MySQL stores data using a B+Tree structure. When inserting rows, InnoDB automatically orders them by the primary key, creating a linked list of pages (each page is 16 KB). The page directory stores the first key of each leaf page, allowing the engine to locate the correct page with a single I/O operation.
Compared with a traditional B‑Tree, a B+Tree stores only keys in internal nodes and all row data in leaf nodes, reducing tree depth and I/O operations. This makes B+Tree the default index structure for InnoDB.
Clustered (primary) indexes combine the index and the row data in the same B+Tree, meaning the leaf nodes contain the full row. Non‑clustered (secondary) indexes store only the indexed keys and a pointer to the primary key; retrieving a row via a secondary index requires a second lookup (the so‑called “back‑table” lookup).
Key points about clustered indexes:
Data access is faster because the index and data are stored together.
Range queries on the primary key are very efficient.
Insert performance depends on sequential primary‑key order; otherwise page splits can degrade performance.
Updating the primary key is costly; it is usually defined as an auto‑incrementing, immutable column.
Secondary (auxiliary) indexes are built on top of the clustered index and always require two lookups: first to obtain the primary key, then to fetch the row data.
In practice, most user‑defined indexes are secondary indexes that ultimately reference the primary key for data retrieval.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.