Understanding Database Indexes: How They Work and Improve Query Performance
This article explains the evolution of data storage, the fundamentals of computer storage devices, how database indexes function like a book's table of contents, the role of binary search, the benefits and drawbacks of indexes, and practical SQL optimization techniques to enhance query speed.
Overview
The development of information storage has led modern companies to keep data in databases, whose fast access largely depends on indexes; this article explains why indexes accelerate database queries.
Computer Storage Principles
Before grasping indexes, one must understand that persistent data resides on storage devices such as RAM (fast, volatile) and hard disks (slow, non‑volatile). Hard disks consist of rotating platters, tracks, and sectors; data retrieval involves seeking the correct track, rotating the platter, and reading the sector, which introduces mechanical overhead.
How Indexes Work
Indexes act like a book's table of contents, allowing the database engine to locate data blocks without scanning the entire table. For a table with 100,000 rows, an index enables binary‑search‑style lookup instead of linear scanning.
Binary Search Method
Binary search requires sorted data. Assuming fixed‑length records of 204 bytes in 1 KB blocks, each block holds 5 records, yielding 20,000 blocks for 100,000 rows. A linear scan would examine all 20,000 blocks, whereas binary search needs only about log₂(20,000) ≈ 14 comparisons, dramatically reducing I/O.
Why Indexes Speed Up Queries
Indexes pre‑sort data, enabling binary search; therefore, indexing primary‑key columns—being unique—produces the most efficient search trees.
Why Not Too Many Indexes?
Excessive indexes increase storage and maintenance overhead, similar to an overly detailed book index that becomes as large as the book itself, degrading performance.
Drawbacks of Indexes
While indexes boost read performance, they slow writes because each insert or update must modify both the data row and the associated index entries. Guidelines include indexing only necessary columns, especially unique or foreign‑key fields, and being mindful of disk space consumption.
Clustered Index
A clustered index stores table rows in the same physical order as the index key (typically the primary key); a table can have only one. The leaf nodes contain the actual data rows, whereas a non‑clustered index’s leaf nodes contain pointers to the data.
Clustered indexes are ideal for columns with many distinct values, range queries (BETWEEN, >, <=), columns frequently used in ORDER BY or GROUP BY, and foreign‑key columns. They are unsuitable for frequently updated columns because row movement can be costly.
Typical Index Invalidations
Using OR conditions, functions, or implicit conversions on indexed columns can prevent index usage, leading to full‑table scans; using IN instead of OR is preferable.
Common SQL Optimization Techniques
1. Avoid Full Table Scans
Ensure WHERE or JOIN predicates reference indexed columns, and avoid scanning tiny tables where a full scan is cheaper.
2. Prevent Index Invalidations
Avoid applying functions or type conversions on indexed columns; use covering indexes to eliminate unnecessary column reads; be aware that MySQL cannot use indexes with !=, IS NULL, IS NOT NULL, or leading‑wildcard LIKE patterns.
3. Minimize Sorting
Prefer index‑ordered scans to avoid explicit sort operations.
4. Select Only Needed Columns
Retrieve only required fields to reduce I/O.
5. Reduce Temporary Table Usage
Avoid creating and dropping temporary tables when possible.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.