Databases 17 min read

Understanding MySQL Indexes: B‑Tree, B+Tree, and Index Design Principles

This article explains how MySQL indexes work, compares binary trees, B‑Tree and B+Tree structures, describes their implementation in MyISAM and InnoDB storage engines, and provides practical guidelines for creating efficient single‑column, composite, and covering indexes to improve query performance.

Architecture Digest

Oct 15, 2020

Understanding MySQL Indexes: B‑Tree, B+Tree, and Index Design Principles

Creating appropriate indexes is the foundation for improving database query performance. The article starts with a simple

CREATE TABLE user (id BIGINT NOT NULL COMMENT 'id' PRIMARY KEY, name VARCHAR(200) NULL COMMENT 'name', age BIGINT NULL COMMENT 'age', gender INT NULL COMMENT 'gender', KEY (name));

example to illustrate table structure.

What is an Index and How Does It Work?

An index is a separate data structure that speeds up row retrieval. Without an index, a query like SELECT * FROM user WHERE id = 40 requires a full table scan; with an index, the engine can perform a binary search on the index and then fetch the row directly.

Why MySQL Uses B+Tree for Indexes

The article first discusses why a simple binary tree is unsuitable: insertion order can create a linear chain, leading to O(n) search time, and balancing operations are costly. It then introduces multi‑way balanced trees (B‑Tree) and explains that B+Tree, a variant of B‑Tree, stores only keys in internal nodes and full records in leaf nodes, allowing higher fan‑out, better disk‑page utilization, and stable I/O performance.

MySQL Storage Engines and Index Implementation

Two storage engines are covered:

MyISAM : uses three files per table (.frm, .MYD for data, .MYI for indexes). Index files contain pointers to data file locations.

InnoDB : uses .frm and .ibd files; the .ibd file stores both data and indexes. InnoDB employs a clustered primary key index where leaf nodes hold the actual rows, while secondary indexes store the primary key values.

Key Principles for Creating Indexes

Column selectivity (discreteness) : higher distinct value ratio yields better index selectivity.

Left‑most prefix rule : index columns are matched from left to right; any leading column must be used for the index to be applicable.

Minimum space principle : smaller key size allows more keys per node, reducing I/O.

Composite (Multi‑Column) Indexes

Composite indexes store combined keys (e.g., CREATE INDEX idx_name_age ON users(name, age);). They follow the left‑most rule, so a query on name alone can still use the composite index, but a query on age alone cannot.

Covering Indexes

A covering index contains all columns needed by a query, allowing the engine to satisfy the query using only the index without accessing the data rows. For example, SELECT id FROM users WHERE name = ? can be answered directly from a name index that stores the id.

Summary of Best Practices

Keep indexed column length as short as possible while meeting business needs.

Avoid excessive or redundant indexes; they increase storage and write overhead.

LIKE patterns starting with a wildcard (e.g., LIKE %a) cannot use indexes; a leading constant (e.g., LIKE a%) may be usable depending on selectivity.

IN can use indexes, NOT IN cannot.

Prefer explicit column lists over SELECT * to enable covering indexes.

Functions on indexed columns invalidate the index.

For composite indexes, queries must use the leftmost columns; range conditions on a column prevent use of subsequent columns.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL mysql Database Optimization Indexes B+Tree Storage Engines

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.