Databases 27 min read

Understanding MySQL Indexes: Theory, Implementation, and Optimization Strategies

This article provides a comprehensive overview of MySQL indexing, covering the theoretical foundations of B‑Tree and B+Tree data structures, the differences between MyISAM and InnoDB index implementations, and practical optimization techniques such as left‑most prefix usage, selectivity analysis, and prefix indexing.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Understanding MySQL Indexes: Theory, Implementation, and Optimization Strategies

The article examines MySQL indexes by first discussing the mathematical and algorithmic foundations of indexing, emphasizing that an index is essentially a data structure that enables efficient data retrieval. It reviews basic search algorithms, such as linear, binary, and tree‑based searches, and explains why databases maintain specialized structures like B‑Tree and B+Tree to support advanced lookup algorithms.

It then describes B‑Tree and B+Tree structures in detail, listing the formal properties of B‑Trees (degree, height, node composition, ordering, etc.) and illustrating their search algorithm with pseudocode, which is presented unchanged within a code block:

BTree_Search(node, key)
{
    if(node == null) return null;
    foreach(node.key)
    {
        if(node.key[i] == key) return node.data[i];
        if(node.key[i] > key) return BTree_Search(point[i]->node);
    }
    return BTree_Search(point[i+1]->node);
}

data = BTree_Search(root, my_key);

The article explains why modern DBMSs favor B‑Tree/B+Tree over binary‑search‑tree variants, noting their high fan‑out, low height, and suitability for disk‑based storage where I/O cost dominates.

Next, it compares MySQL storage‑engine specific index implementations. MyISAM stores only record pointers in leaf nodes of a B+Tree (non‑clustered index), while InnoDB stores complete rows in leaf nodes (clustered index) and uses secondary indexes that reference the primary key. Diagrams illustrate these differences.

Practical index‑usage strategies are then explored, focusing on the left‑most prefix rule for composite indexes, the impact of missing prefix columns, and techniques such as "gap filling" with IN clauses. The article also covers selectivity calculations, when to avoid indexes on low‑selectivity columns, and the creation of prefix indexes to balance index size and effectiveness.

Finally, it discusses primary‑key selection for InnoDB, recommending auto‑increment surrogate keys to maintain sequential inserts and avoid page splits, contrasted with random keys (e.g., ID numbers) that cause costly page reorganizations. The conclusion acknowledges the article as a learning note and suggests future extensions on ordering, covering indexes, and other index types.

OptimizationindexingDatabaseInnoDBMySQLB+TreeB-Tree
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.