Databases 22 min read

Understanding MySQL Indexes: B+ Tree Principles and Optimization

This article explains why MySQL uses B+ trees for indexing, describes the underlying principles of various index types, compares MyISAM and InnoDB implementations, and provides practical optimization guidelines such as using auto‑increment primary keys, left‑most prefix rules, and configuration tuning.

Architect
Architect
Architect
Understanding MySQL Indexes: B+ Tree Principles and Optimization

Preface

There are many kinds of indexes—hash, B‑tree, B+‑tree, full‑text, etc. MySQL supports multiple storage engines, each with different index capabilities. This article explores why MySQL adopts B+ trees for indexes, the underlying principles of indexes, and how to optimise them in SQL.

Principles of Indexes

Purpose of Indexes

Indexes aim to improve query or retrieval efficiency. For example, searching for the word "mysql" in a dictionary is similar to traversing a tree from the root to a leaf, which is far faster than scanning a linked list sequentially.

Why MySQL Uses B+ Trees

Ordinary binary trees become too tall for large data sets, leading to poor performance. MySQL therefore prefers B+ trees over B‑trees because B+ trees reduce disk I/O and provide more balanced search paths.

B‑Tree Introduction

Binary‑tree search has O(log N) complexity, but disk I/O is the real bottleneck. When data cannot fit entirely in memory, each disk page corresponds to a tree node; a deep tree means many I/O operations. Multi‑way search trees (B‑tree/B+‑tree) were created to minimise this cost.

Each node stores keys and pointers, interleaved.

Keys within a node are sorted in ascending order.

Leftmost pointer points to keys smaller than the first key; intermediate pointers point to ranges between adjacent keys.

Node size is uniform; each non‑leaf node contains n‑1 keys and n pointers, where d ≤ n ≤ 2d.

B+ Tree

Internal nodes store only keys and pointers; leaf nodes store keys and data.

Internal and leaf nodes have different sizes because they store different information.

Each non‑leaf node can have up to 2d pointers (instead of 2d + 1).

Because leaf nodes contain no data, more space is available for keys, resulting in a higher fan‑out and shallower trees, which improves search efficiency.

Why B+ Trees Suit MySQL Indexes

Lower disk‑read cost: non‑leaf nodes contain only keys, allowing more keys per page and fewer I/O operations.

Stable query performance: every search follows the same root‑to‑leaf path, unlike B‑trees where data in internal nodes can cause varying paths.

Facilitates full‑table scans: all data resides in leaf nodes, so scanning the leaf level yields the entire dataset without traversing internal nodes.

MySQL Index Implementation

MyISAM Index Implementation

MyISAM uses B+ trees; leaf nodes store the address of the data record.

The primary index (e.g., on column Col1) stores record addresses in the leaf data field. A secondary index (e.g., on column Col2) has a similar B+‑tree structure but points to the primary key values.

InnoDB Index Implementation

InnoDB differs in two ways:

The data file itself is a B+‑tree and serves as the clustered primary index; leaf nodes contain the full row data.

Secondary indexes store the primary‑key value in their leaf nodes instead of the record address.

Index Optimization

Strong Recommendation: Use Auto‑Increment Primary Keys

When using InnoDB, an auto‑increment integer primary key keeps inserts sequential, resulting in compact leaf pages and minimal page splits, which greatly improves write performance.

Conversely, random non‑sequential keys (e.g., UUIDs or IDs based on strings) cause frequent page splits and data movement, degrading performance.

Leftmost Prefix Principle

For a composite index (A, B, C), MySQL can use the index only for queries that reference the leftmost columns in order (A; A + B; A + B + C). Skipping a column (e.g., querying B alone) prevents the index from being used.

Other Indexing Guidelines

Prefer high‑cardinality columns (distinct/total ratio close to 1).

Avoid functions on indexed columns; they invalidate the index.

Composite indexes are more cost‑effective than multiple single‑column indexes.

Index columns that are frequently filtered, joined, sorted, or grouped.

Limit the total number of indexes to avoid overhead on INSERT/UPDATE.

Use the smallest possible data type for indexed columns.

Remove unused or rarely used indexes.

MySQL Performance Tuning

Common Causes of Slow Queries

Hardware bottlenecks (network, memory, I/O, disk space).

Missing or ineffective indexes.

Excessive data volume (consider sharding).

Improper server or configuration settings.

Analyzing and Solving Slow Queries

Enable the slow‑query log and set an appropriate threshold.

Use EXPLAIN to examine execution plans and identify missing indexes.

Use SHOW PROFILE for detailed per‑statement timing.

Consult DBAs or ops teams for server‑level tuning.

Configuration Optimization

Basic Settings

innodb_buffer_pool_size : size of the InnoDB data and index cache; larger is better.

innodb_log_file_size : size of the redo log; larger values improve write throughput.

max_connections : increase cautiously; too high can cause resource exhaustion.

InnoDB‑Specific Settings

innodb_file_per_table : store each table in its own .ibd file for easier space reclamation.

innodb_flush_log_at_trx_commit : 1 for full ACID safety, 2 for better performance on replicas.

innodb_flush_method : O_DIRECT for RAID with battery‑backed cache; otherwise fdatasync.

innodb_log_buffer_size : increase if large BLOBs or TEXT are frequently written.

Explain Execution Plan

Prepare sample data and run EXPLAIN to view columns such as id, select_type, table, type, possible_keys, key, key_len, ref, rows, and extra.

CREATE TABLE `user_info` (
`id`   BIGINT(20) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(50) NOT NULL DEFAULT '',
`age`  INT(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_index` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO user_info (name, age) VALUES ('xys', 20);
... (additional INSERT statements) ...
CREATE TABLE `order_info` (
`id`        BIGINT(20) NOT NULL AUTO_INCREMENT,
`user_id`   BIGINT(20) DEFAULT NULL,
`product_name` VARCHAR(50) NOT NULL DEFAULT '',
`productor` VARCHAR(30) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_product_detail_index` (`user_id`,`product_name`,`productor`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
... (additional INSERT statements) ...

EXPLAIN output shows how MySQL chooses indexes, join order, and whether it uses filesort, temporary tables, or index‑only scans.

Summary

Key indexing principles include choosing unique or high‑cardinality indexes, indexing columns used for sorting/grouping/joining, limiting the number of indexes, preferring the leftmost prefix rule, keeping indexed columns free of functions, and extending existing composite indexes instead of creating new ones.

·END·

InnoDBMySQLDatabase OptimizationIndexMyISAMB+Tree
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.