Understanding Database Indexes: Storage Principles, Types, and Optimization Techniques
This article explains how computer storage works, why database indexes dramatically speed up queries, the mechanics of binary search, the differences between clustered and non‑clustered indexes, common pitfalls of over‑indexing, and practical SQL optimization strategies to avoid full table scans and index invalidation.
Overview
Human information storage has evolved from physical media to modern databases, which store data on disk but achieve fast access largely thanks to indexes that act like a book's table of contents.
Computer Storage Principles
Data persisted in a database resides on physical storage devices such as hard disks and RAM. Hard disks consist of rotating platters, tracks, and sectors; accessing data requires seeking the correct track, rotating the platter, and reading the sector, which introduces latency. RAM provides fast, volatile storage, so operating systems cache disk data in RAM before applications use it.
How Indexes Work
An index is analogous to a dictionary's index: it allows the database engine to locate rows without scanning the entire table. By maintaining a sorted structure, the engine can quickly navigate to the relevant data blocks.
Binary Search Method
Binary search requires sorted data and reduces the number of examined blocks dramatically. For example, with 100,000 records stored in 20,000 blocks, a full scan would examine all 20,000 blocks, whereas binary search needs only about log₂(20,000) ≈ 14 comparisons.
固定记录大小=204字节,块大小=1024字节This yields 5 records per block, so 100,000 records occupy 20,000 blocks.
Why Indexes Speed Up Queries
Indexes pre‑sort data, enabling binary‑search‑like lookups; therefore, queries on indexed columns (especially primary keys) can locate rows in O(log N) time instead of O(N), often providing a several‑hundred‑fold performance boost.
Why Too Many Indexes Hurt Performance
When every column is indexed, the index itself becomes as large as the table, turning the index lookup into a costly operation similar to a full table scan.
Drawbacks of Indexes
Each indexed column adds write overhead because inserts/updates must modify both the row and the index.
Indexes consume disk space.
Foreign‑key columns should be indexed to support joins.
Clustered Index
A clustered (or “clustered”) index stores rows physically in the same order as the indexed column values, allowing range queries to read contiguous disk blocks. Only one clustered index can exist per table, typically on the primary key.
Typical Index Invalidations
Using OR conditions, functions, or type conversions on indexed columns can prevent the optimizer from using the index, leading to full scans.
Common SQL Optimization Techniques
1. Avoid Full Table Scans
Ensure WHERE and JOIN predicates reference indexed columns, and consider table size before deciding to scan.
2. Prevent Index Invalidations
Avoid functions, calculations, or implicit conversions on indexed columns; use covering indexes when possible.
3. Reduce Unnecessary Sorting
Prefer indexes that provide the required order instead of sorting results after retrieval.
4. Select Only Needed Columns
Avoid SELECT * to reduce I/O.
5. Minimize Temporary Table Usage
Design queries to work without creating intermediate tables when possible.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.