Databases 22 min read

Understanding MySQL Indexes: B+Tree Structure, Implementation, and Optimization

This article explains why MySQL uses B+‑tree indexes, describes the principles of B‑tree and B+‑tree structures, compares MyISAM and InnoDB index implementations, and provides practical optimization tips such as using auto‑increment primary keys, the left‑most prefix rule, and proper configuration settings.

Top Architect
Top Architect
Top Architect
Understanding MySQL Indexes: B+Tree Structure, Implementation, and Optimization

Introduction

Indexes are data structures that help MySQL retrieve rows efficiently. MySQL supports many index types (hash, B‑tree, B+‑tree, full‑text, etc.) and multiple storage engines, each with its own index capabilities. This article explores why MySQL chooses B+‑trees, the underlying principles of indexes, and how to optimise them.

Index Principles

The purpose of an index is to improve query speed by reducing the amount of data that must be scanned, similar to looking up a word in a dictionary by narrowing the search range step by step.

Why MySQL Uses B+‑Tree

Ordinary binary trees have high height for large data sets, leading to poor performance. B‑trees improve on this but still store data in internal nodes, causing more disk I/O. B+‑trees store only keys in internal nodes and all data in leaf nodes, which reduces I/O and improves search stability.

B‑Tree Overview

B‑trees are multi‑way search trees designed to minimise disk I/O. Each node contains a range of keys and pointers, with the number of keys per node bounded by a degree d (d ≤ n ≤ 2d). The height is much lower than that of a binary tree, making them suitable for storage on disk.

B+‑Tree Characteristics

Internal nodes store only keys and pointers; leaf nodes store keys and the actual row data.

Leaf nodes are linked, enabling efficient range scans.

Because leaf nodes contain no data, internal nodes can hold more keys, reducing tree depth and I/O.

Why B+‑Tree Fits MySQL Indexes

Lower disk‑read cost: more keys per page mean fewer page reads.

Stable query path: every search follows the same root‑to‑leaf path.

Easy full‑table scans: leaf nodes contain all data, so a sequential scan of leaves retrieves rows in order.

MySQL Index Implementations

MyISAM

MyISAM uses B+‑tree indexes where the leaf data field stores the physical address of the row. This is a non‑clustered index; the table data is stored separately.

InnoDB

InnoDB stores the table data itself in a clustered B+‑tree indexed by the primary key. Leaf nodes contain the full row, and secondary indexes store the primary‑key value in their data field.

Index Optimisation

Prefer Auto‑Increment Primary Keys

Using an auto‑increment integer as the primary key keeps inserts at the end of the leaf page, avoiding page splits and reducing write amplification.

Left‑Most Prefix Rule for Composite Indexes

For a composite index (A, B, C), queries can use the index only if they filter on A, or A + B, or A + B + C. Skipping the leftmost column makes the index unusable.

Other Best Practices

Choose columns with high selectivity (distinct/total ratio close to 1).

Avoid functions on indexed columns; they prevent index usage.

Prefer composite indexes over many single‑column indexes.

Index columns frequently used in WHERE, JOIN, ORDER BY, or GROUP BY clauses.

MySQL Performance Tuning

Common Causes of Slow Queries

Hardware bottlenecks (network, memory, I/O).

Missing or ineffective indexes.

Excessive data volume (need sharding).

Improper server or configuration settings.

Diagnosing Slow SQL

Enable the slow‑query log and set an appropriate threshold.

Use EXPLAIN to view the execution plan.

Use SHOW PROFILE for detailed timing.

Adjust server parameters with DBA assistance.

Configuration Recommendations

innodb_buffer_pool_size : allocate as much memory as possible for caching data and indexes.

innodb_log_file_size : larger redo logs improve write throughput.

max_connections : increase cautiously; too many active connections can degrade performance.

innodb_file_per_table : store each table in its own .ibd file for easier space reclamation.

innodb_flush_log_at_trx_commit : set to 2 for better performance when durability can be relaxed.

innodb_flush_method : use O_DIRECT with hardware RAID, otherwise fdatasync .

Example DDL and Data

CREATE TABLE `user_info` (
  `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL DEFAULT '',
  `age` INT(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `name_index` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO user_info (name, age) VALUES ('xys',20),('a',21),('b',23),('c',50),('d',15),('e',20),('f',21),('g',23),('h',50),('i',15);

CREATE TABLE `order_info` (
  `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `user_id` BIGINT(20) DEFAULT NULL,
  `product_name` VARCHAR(50) NOT NULL DEFAULT '',
  `productor` VARCHAR(30) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `user_product_detail_index` (`user_id`,`product_name`,`productor`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO order_info (user_id, product_name, productor) VALUES
  (1,'p1','WHH'),(1,'p2','WL'),(1,'p1','DX'),(2,'p1','WHH'),(2,'p5','WL'),(3,'p3','MA'),(4,'p1','WHH'),(6,'p1','WHH'),(9,'p8','TE');

Understanding EXPLAIN Output

id : execution order of rows.

select_type : type of SELECT (SIMPLE, PRIMARY, SUBQUERY, etc.).

table : table or derived table involved.

type : access method (system, const, eq_ref, ref, range, index, ALL).

possible_keys : indexes that could be used.

key : index actually used.

key_len : length of the used index in bytes.

ref : column or constant compared to the index.

rows : estimated rows examined.

extra : additional info (using filesort, using index, using temporary, using where).

Conclusion

Effective indexing in MySQL relies on understanding B+‑tree mechanics, choosing appropriate primary and secondary keys, following the left‑most prefix rule, and configuring InnoDB parameters to match workload characteristics. Proper index design dramatically reduces I/O, speeds up queries, and improves overall database performance.

SQLInnoDBMySQLDatabase OptimizationIndexMyISAMB-Tree
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.