How I Cut a 20‑Second MySQL Query to 200 ms: A Step‑by‑Step Optimization Journey
When a critical e‑commerce report took over 20 seconds and triggered a flood of user complaints, a systematic investigation using EXPLAIN, slow‑query logs, and profiling revealed missing indexes, costly joins, and temporary tables, leading to a four‑stage rewrite that reduced execution time to sub‑second performance.
Background and Problem
At 3 am an alert showed that a core business API response exceeded 20 seconds. The culprit was a seemingly simple MySQL query that aggregated order statistics for VIP merchants in Beijing over the past 30 days. The original query took 20.34 seconds, a 100× slowdown.
Schema Overview
-- Orders (5 M rows)
CREATE TABLE orders (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
order_no VARCHAR(32),
merchant_id INT,
user_id INT,
amount DECIMAL(10,2),
status TINYINT,
created_at DATETIME,
updated_at DATETIME
) ENGINE=InnoDB;
-- Order items (20 M rows)
CREATE TABLE order_items (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
order_id BIGINT,
product_id INT,
quantity INT,
price DECIMAL(10,2),
created_at DATETIME
) ENGINE=InnoDB;
-- Merchants (100 k rows)
CREATE TABLE merchants (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
category VARCHAR(50),
city VARCHAR(50),
level TINYINT
) ENGINE=InnoDB;Original Query
SELECT
m.id,
m.name,
COUNT(DISTINCT o.id) AS order_count,
COUNT(DISTINCT o.user_id) AS user_count,
SUM(o.amount) AS total_amount,
AVG(o.amount) AS avg_amount,
SUM(oi.quantity) AS total_items
FROM merchants m
LEFT JOIN orders o ON m.id = o.merchant_id
LEFT JOIN order_items oi ON o.id = oi.order_id
WHERE m.city = '北京'
AND m.level = 3
AND o.status = 1
AND o.created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY m.id, m.name
ORDER BY total_amount DESC
LIMIT 100;Problem Diagnosis
1. EXPLAIN Analysis
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | m | ALL | NULL | NULL | NULL | NULL | 100000 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 5000000 | Using where; Using join buffer |
| 1 | SIMPLE | oi | ALL | NULL | NULL | NULL | NULL |20000000 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------------------------------------+All three tables performed full scans (type=ALL).
No indexes were used (key=NULL).
Temporary tables and filesort were created.
Resulted in a Cartesian‑product‑like row count of 100 k × 5 M × 20 M.
2. Slow‑Query Log
# Time: 2024-03-15T03:25:41.123456Z
# Query_time: 20.342387 Lock_time: 0.000123 Rows_sent: 100 Rows_examined: 25438921Only 100 rows were returned after scanning 25 M rows, a 254 389× inefficiency.
3. Profiling
SET profiling = 1;
-- run the query
SHOW PROFILE;Sending data: 18.5 s (90%).
Creating sort index: 1.2 s (6%).
Copying to temporary table: 0.6 s (3%).
Optimization Strategy (Four‑Step Plan)
Step 1 – Add Essential Indexes
-- Merchant table index
ALTER TABLE merchants ADD INDEX idx_city_level (city, level);
-- Orders composite index (order matters)
ALTER TABLE orders ADD INDEX idx_merchant_status_created (merchant_id, status, created_at);
-- Order items index
ALTER TABLE order_items ADD INDEX idx_order_id (order_id);Index design principles:
Follow the left‑most prefix rule.
Place highly selective columns first.
Consider both WHERE and JOIN conditions.
Step 2 – Rewrite SQL (Filter First, Join Later)
SELECT
m.id,
m.name,
t.order_count,
t.user_count,
t.total_amount,
t.avg_amount,
t.total_items
FROM merchants m
INNER JOIN (
SELECT
o.merchant_id,
COUNT(DISTINCT o.id) AS order_count,
COUNT(DISTINCT o.user_id) AS user_count,
SUM(o.amount) AS total_amount,
AVG(o.amount) AS avg_amount,
SUM(items.item_count) AS total_items
FROM orders o
LEFT JOIN (
SELECT order_id, SUM(quantity) AS item_count
FROM order_items
GROUP BY order_id
) items ON o.id = items.order_id
WHERE o.status = 1
AND o.created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY o.merchant_id
) t ON m.id = t.merchant_id
WHERE m.city = '北京'
AND m.level = 3
ORDER BY t.total_amount DESC
LIMIT 100;This reduces the intermediate result set dramatically by applying the date and status filters before the joins.
Step 3 – Use Covering Index
-- Covering index that includes all selected columns
ALTER TABLE orders ADD INDEX idx_covering (merchant_id, status, created_at, id, user_id, amount);The query can now be satisfied entirely from the index without a table lookup.
Step 4 – Materialized View (Pre‑Aggregated Summary Table)
-- Summary table
CREATE TABLE merchant_order_summary (
merchant_id INT,
summary_date DATE,
order_count INT,
user_count INT,
total_amount DECIMAL(10,2),
avg_amount DECIMAL(10,2),
total_items INT,
PRIMARY KEY (merchant_id, summary_date),
INDEX idx_date (summary_date)
) ENGINE=InnoDB;
-- Hourly refresh (example)
INSERT INTO merchant_order_summary
SELECT
merchant_id,
DATE(created_at) AS summary_date,
COUNT(DISTINCT id) AS order_count,
COUNT(DISTINCT user_id) AS user_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount,
(SELECT SUM(quantity) FROM order_items WHERE order_id IN (SELECT id FROM orders WHERE merchant_id = o.merchant_id)) AS total_items
FROM orders o
WHERE status = 1 AND created_at >= CURDATE()
GROUP BY merchant_id, DATE(created_at)
ON DUPLICATE KEY UPDATE
order_count = VALUES(order_count),
user_count = VALUES(user_count),
total_amount = VALUES(total_amount),
avg_amount = VALUES(avg_amount),
total_items = VALUES(total_items);
-- Query the summary
SELECT
m.id,
m.name,
SUM(s.order_count) AS order_count,
SUM(s.user_count) AS user_count,
SUM(s.total_amount) AS total_amount,
AVG(s.avg_amount) AS avg_amount,
SUM(s.total_items) AS total_items
FROM merchants m
INNER JOIN merchant_order_summary s ON m.id = s.merchant_id
WHERE m.city = '北京' AND m.level = 3 AND s.summary_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
GROUP BY m.id, m.name
ORDER BY total_amount DESC
LIMIT 100;Performance Gains
Original query: 20.34 s (no index).
After adding indexes: 8.5 s (≈2.4× faster).
After SQL rewrite: 2.3 s (≈8.8× faster).
After covering index: 0.8 s (≈25× faster).
After materialized view: 0.2 s (≈100× faster).
General Optimization Methodology
Diagnostic Three‑Step
EXPLAIN analysis – check type, key, and Extra fields.
Slow‑query log – enable and examine query_time, rows_examined, etc.
Profiling – use SET profiling=1 and SHOW PROFILE to locate bottlenecks.
Six‑Step Optimization Framework
Index optimization – create selective, left‑most, covering indexes.
SQL rewrite – filter before join, avoid SELECT *, use sub‑queries wisely.
Table structure – consider denormalization, proper data types, partitioning.
Cache strategy – use Redis or application‑level cache for hot data.
Read/write splitting – master‑slave replication and load balancing.
Data archiving – move historical data to cold storage.
Common Pitfalls
Over‑indexing – creates write overhead; prefer composite indexes.
Ignoring write performance – balance read‑optimizing indexes with write cost.
Cache stampede – randomize TTL to avoid massive simultaneous expirations.
Monitoring and Prevention
Slow‑Query Monitoring View
CREATE VIEW slow_query_monitor AS
SELECT
DATE(start_time) AS query_date,
LEFT(sql_text,100) AS query_sample,
COUNT(*) AS exec_count,
AVG(query_time) AS avg_time,
MAX(query_time) AS max_time,
SUM(rows_examined) AS total_rows_examined
FROM mysql.slow_log
GROUP BY DATE(start_time), LEFT(sql_text,100)
ORDER BY avg_time DESC;Automated Alert Script (Python)
#!/usr/bin/env python3
import MySQLdb, smtplib
from email.mime.text import MIMEText
def check_slow_queries():
db = MySQLdb.connect(host="localhost", user="monitor", passwd="password", db="mysql")
cur = db.cursor()
cur.execute("""
SELECT COUNT(*) AS slow_count, AVG(query_time) AS avg_time
FROM mysql.slow_log
WHERE start_time >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
""")
slow_count, avg_time = cur.fetchone()
if slow_count > 100 or avg_time > 5:
send_alert(f"Slow query alert: count={slow_count}, avg_time={avg_time}s")
cur.close(); db.close()
def send_alert(message):
msg = MIMEText(message)
msg['Subject'] = 'MySQL Slow Query Alert'
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'
s = smtplib.SMTP('localhost')
s.send_message(msg)
s.quit()
if __name__ == "__main__":
check_slow_queries()Quick Diagnostic Bash Script
#!/bin/bash
echo "=== MySQL Performance Quick Check ==="
echo -e "
[1] Slow Query Settings:"
mysql -e "SHOW VARIABLES LIKE '%slow%';"
echo -e "
[2] Recent Slow Queries:"
mysql -e "SELECT * FROM mysql.slow_log ORDER BY start_time DESC LIMIT 5\G"
echo -e "
[3] Index Usage Stats:"
mysql -e "SELECT table_schema, table_name, index_name, cardinality FROM information_schema.STATISTICS WHERE table_schema NOT IN ('mysql','information_schema','performance_schema') ORDER BY cardinality DESC LIMIT 10;"
echo -e "
[4] Table Sizes:"
mysql -e "SELECT table_schema, table_name, ROUND(data_length/1024/1024,2) AS Data_MB, ROUND(index_length/1024/1024,2) AS Index_MB, ROUND((data_length+index_length)/1024/1024,2) AS Total_MB FROM information_schema.TABLES WHERE table_schema NOT IN ('mysql','information_schema','performance_schema') ORDER BY data_length+index_length DESC LIMIT 10;"
echo -e "
[5] Current Connections:"
mysql -e "SHOW STATUS LIKE '%connect%';"
echo "=== Check Complete ==="Toolbox
MySQL built‑in tools: EXPLAIN, SHOW PROFILE, Performance Schema.
Third‑party: Percona Toolkit pt‑query‑digest, MySQLTuner, MySQL Workbench, Prometheus + Grafana.
Online analysis: mysqladmin processlist, INFORMATION_SCHEMA.INNODB_LOCKS, INFORMATION_SCHEMA.INNODB_TRX.
Best‑Practice Takeaways
Measure before you change – use logs and profiling to establish baselines.
Apply the 80/20 rule – focus on the few queries that consume most resources.
Iterative optimization – change one thing at a time, record impact, keep rollback plans.
Team Collaboration Advice
SQL Review Process – checklist for indexes, full scans, join count, SELECT *, sub‑query vs JOIN, growth considerations.
Development Standards – naming conventions, index naming, SQL style guide.
Knowledge Sharing – regular tech talks, maintain an internal case‑library, document patterns in a wiki.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
