Why Your MySQL Indexes Fail: Cost Estimation, Fragmentation, and Slow Query Secrets
This article examines MySQL slow‑query troubleshooting, explaining how the optimizer estimates costs, why indexes may be ineffective even when present, the impact of memory fragmentation, pitfalls of prefix indexes, index merging techniques, and additional resource‑related factors that can cause seemingly healthy SQL statements to become slow.
Background
At the beginning of the year the team launched a "service stability issue governance" project, optimizing error logs, slow SQL, and interface performance to improve system reliability. This article uses real slow‑SQL cases to analyze principles and share findings.
How MySQL Evaluates Cost
A sudden alert revealed a slow SQL. Using EXPLAIN quickly produced an execution plan showing a full‑table scan (key=null). The optimizer chooses the plan it believes has the lowest cost, similar to taking an elevator when available.
The query cost consists of I/O cost and CPU cost. Data resides on disk; each disk read incurs I/O cost, which dominates overall cost. For example, reading a row from page A then another from page B adds an extra I/O.
Because the scan occurs on the clustered index, MySQL estimates the number of pages and rows, then calculates I/O and scan (CPU) costs.
<code>IO cost: pages × io_block_read_cost (default 0.25) + 1.0
CPU cost: rows × cpu_tuple_cost (default 0.1)
Total cost = IO cost + CPU cost</code>MySQL’s default I/O cost assumes random I/O (io_block_read_cost=1.0). For sequential scans the default is 0.25. Larger tables increase page count, thus raising total cost.
Indexes May Not Help Even When Added
Adding an index to a slow query does not guarantee improvement. In the example, adding an index on column_key allowed EXPLAIN to use the index in a test environment, but the production query remained slow.
If the indexed column has low selectivity, the optimizer still prefers a full‑table scan. In testing, fewer rows make the index appear beneficial, explaining the plan difference.
Index effectiveness depends on query conditions and data distribution.
How to avoid this during index design?
Columns with low cardinality (e.g., gender) often produce inefficient indexes. Use the distinct‑count ratio to assess selectivity:
<code>SELECT COUNT(DISTINCT column_name) / COUNT(*)</code>Fields with selectivity below 10% should not have standalone indexes; for composite indexes, place high‑selectivity columns first.
Memory Fragmentation Matters
When memory fragmentation is severe, statistical information may become outdated, causing the optimizer’s cost estimates to diverge from reality.
Example: a table with 200 million obsolete rows was cleaned up, but residual fragmentation inflated Data_free to over 50 GB, increasing physical I/O and reducing buffer‑pool hit rate.
Pitfalls of Prefix Indexes
A unique index that uses a prefix (e.g., key_props(10) ) stores only the first N characters in the leaf nodes. This saves space but can cause false‑positive duplicate conflicts when different values share the same prefix.
For long columns, prefix indexes are acceptable, but when combined with a UNIQUE constraint, ensure the prefix is truly unique.
Index Merge
MySQL can combine multiple indexes to satisfy a query, reducing the need for full table lookups. For example, with separate indexes on k1 and k2 , MySQL retrieves matching IDs from each index, intersects them, and then accesses the rows.
Benefits include fewer row lookups and ordered access due to ID ordering, which reduces random I/O.
When SQL Looks Fine but Still Slow
Slow queries can stem from resource bottlenecks such as disk, CPU, or network overload, especially during large data imports.
Long‑running transactions can inflate the Undo Log, decreasing scan efficiency and buffer‑pool hit rate. Monitoring SHOW ENGINE INNODB STATUS for a rising History List Length helps detect this.
Summary
The article presented several real‑world cases to illustrate MySQL cost estimation, index effectiveness, memory fragmentation, prefix index issues, index merging, and resource‑related slow‑query causes. Understanding these principles enables more effective analysis and optimization of MySQL performance.
Sanyou's Java Diary
Passionate about technology, though not great at solving problems; eager to share, never tire of learning!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.