Understanding MySQL Indexes: B‑Tree, B+Tree, and Index Design Principles
This article explains how MySQL indexes work, compares binary trees, B‑Tree and B+Tree structures, describes their implementation in MyISAM and InnoDB storage engines, and provides practical guidelines for creating efficient single‑column, composite, and covering indexes to improve query performance.
Creating appropriate indexes is the foundation for improving database query performance. The article starts with a simple CREATE TABLE user (id BIGINT NOT NULL COMMENT 'id' PRIMARY KEY, name VARCHAR(200) NULL COMMENT 'name', age BIGINT NULL COMMENT 'age', gender INT NULL COMMENT 'gender', KEY (name)); example to illustrate table structure.
What is an Index and How Does It Work?
An index is a separate data structure that speeds up row retrieval. Without an index, a query like SELECT * FROM user WHERE id = 40 requires a full table scan; with an index, the engine can perform a binary search on the index and then fetch the row directly.
Why MySQL Uses B+Tree for Indexes
The article first discusses why a simple binary tree is unsuitable: insertion order can create a linear chain, leading to O(n) search time, and balancing operations are costly. It then introduces multi‑way balanced trees (B‑Tree) and explains that B+Tree, a variant of B‑Tree, stores only keys in internal nodes and full records in leaf nodes, allowing higher fan‑out, better disk‑page utilization, and stable I/O performance.
MySQL Storage Engines and Index Implementation
Two storage engines are covered:
MyISAM : uses three files per table (.frm, .MYD for data, .MYI for indexes). Index files contain pointers to data file locations.
InnoDB : uses .frm and .ibd files; the .ibd file stores both data and indexes. InnoDB employs a clustered primary key index where leaf nodes hold the actual rows, while secondary indexes store the primary key values.
Key Principles for Creating Indexes
Column selectivity (discreteness) : higher distinct value ratio yields better index selectivity.
Left‑most prefix rule : index columns are matched from left to right; any leading column must be used for the index to be applicable.
Minimum space principle : smaller key size allows more keys per node, reducing I/O.
Composite (Multi‑Column) Indexes
Composite indexes store combined keys (e.g., CREATE INDEX idx_name_age ON users(name, age); ). They follow the left‑most rule, so a query on name alone can still use the composite index, but a query on age alone cannot.
Covering Indexes
A covering index contains all columns needed by a query, allowing the engine to satisfy the query using only the index without accessing the data rows. For example, SELECT id FROM users WHERE name = ? can be answered directly from a name index that stores the id.
Summary of Best Practices
Keep indexed column length as short as possible while meeting business needs.
Avoid excessive or redundant indexes; they increase storage and write overhead.
LIKE patterns starting with a wildcard (e.g., LIKE %a ) cannot use indexes; a leading constant (e.g., LIKE a% ) may be usable depending on selectivity.
IN can use indexes, NOT IN cannot.
Prefer explicit column lists over SELECT * to enable covering indexes.
Functions on indexed columns invalidate the index.
For composite indexes, queries must use the leftmost columns; range conditions on a column prevent use of subsequent columns.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.