How a Nighttime MySQL Migration to RDS Triggered a CPU Crisis—and How We Fixed It
This article recounts a blockchain finance company's rapid user growth, the decision to move MySQL from Docker Swarm to Alibaba Cloud RDS, the ensuing CPU overload caused by missing indexes, and the step‑by‑step troubleshooting and optimization that restored system stability.
Event Origin
The author joined a fast‑growing blockchain finance company whose services run on Alibaba Cloud. All applications, including MySQL, were deployed in Docker Swarm on an ECS cluster. Anticipating user growth, the author realized the MySQL container would soon hit capacity limits, prompting a migration plan.
Migration Plan
Original architecture diagram:
The stack consisted of CDN → WAF → ECS cluster. Docker Swarm connected all ECS hosts, running services such as Nginx, PHP, Redis, and a single MySQL container whose data persisted via a mounted volume. PHP applications accessed MySQL inside the container.
Sample docker‑compose yaml
<code>version: "3"
services:
ussbao:
# replace username/repo:tag with your name and image details
image: 隐藏此镜像信息
deploy:
replicas: 1
restart_policy:
condition: on-failure
environment:
MYSQL_ROOT_PASSWORD: 隐藏此信息
volumes:
- "/data//mysql/db1/:/var/lib/mysql/"
- "/etc/localtime:/etc/localtime"
- "/etc/timezone:/etc/timezone"
networks:
default:
external:
name: 隐藏此信息</code>Each database ran a single MySQL container without master‑slave or read‑write separation and without any performance tuning.
Adjusted Architecture
After migration, MySQL was moved to an RDS instance; other services such as OSS and cloud Redis were also provisioned (not shown).
Migration Process
Steps: provision RDS → backup SQL → import to RDS → modify DB configuration → test.
Plan RDS instance size and create databases/users.
Add IP whitelist for RDS access.
Use
mysqldumpto back up MySQL containers.
Import the
.sqlfiles into RDS.
Update PHP project DB configuration.
Clear PHP cache directories.
Perform validation tests.
Enable RDS automated backups.
The migration was executed late at night to avoid user traffic.
Snowball Effect
After migration, at the 9 am peak the application began returning request timeouts.
Emergency Handling
Investigation revealed RDS CPU at 100 % with many connections. A large table
ub_user_calculate(over 2 million rows) lacked indexes, causing full‑table scans and connection exhaustion.
Actions taken:
Added missing indexes to the problematic table.
Killed active sessions (some could not be terminated).
Temporarily disabled the domain to stop traffic.
Restarted the RDS instance to lower CPU usage.
Re‑enabled the domain and monitored that sessions and CPU returned to normal.
Backup script used for nightly backups:
<code>#!/bin/bash
#数据库IP
dbserver='*******'
#数据库用户名
dbuser='ganbing'
#数据库密码
dbpasswd='************'
#备份数据库,多个库用空格隔开
dbname='db1 db2 db3'
#备份时间
backtime=`date +%Y%m%d%H%M`
out_time=`date +%Y%m%d%H%M%S`
#备份输出路径
backpath='/data/backup/mysql/'
logpath=''/data/backup/logs/
echo "################## ${backtime} #############################"
echo "开始备份"
#日志记录头部
echo "" >> ${logpath}/${dbname}_back.log
echo "-------------------------------------------------" >> ${logpath}/${dbname}_back.log
echo "备份时间为${backtime},备份数据库 ${dbname} 开始" >> ${logpath}/${dbname}_back.log
#正式备份数据库
for DB in $dbname; do
source=`/usr/bin/mysqldump -h ${dbserver} -u ${dbuser} -p${dbpasswd} ${DB} > ${backpath}/${DB}-${out_time}.sql` 2>> ${backpath}/mysqlback.log;
if [ "$?" == 0 ];then
cd $backpath
tar zcf ${DB}-${backtime}.tar.gz ${DB}-${backtime}.sql > /dev/null
rm -f ${DB}-${backtime}.sql
find $backpath -name "*.tar.gz" -type f -mtime +15 -exec rm -rf {} \; > /dev/null 2>&1
echo "数据库 ${dbname} 备份成功!!" >> ${logpath}/${dbname}_back.log
else
echo "数据库 ${dbname} 备份失败!!" >> ${logpath}/${dbname}_back.log
fi
done
echo "完成备份"
echo "################## ${backtime} #############################"</code>Index Usage Strategy and Optimization
Creating Indexes – Best Practices
Add indexes on fields that are frequently queried but rarely updated.
Fields used in
ORDER BYor
GROUP BYshould be indexed.
Limit the number of indexes per table to about six.
Keep indexed column length short.
Primary keys do not need a separate index.
Index columns with high selectivity.
Using Indexes – Pitfalls
A leading
%in a
LIKEpattern disables the index.
NULL values are excluded from indexes; avoid indexing nullable columns.
If an
ORcondition includes an unindexed column, the index on the other column is ignored.
The
!=operator forces a full table scan.
Avoid applying functions or calculations on indexed columns.
Composite indexes require the leftmost prefix to be used in queries.
Match data types exactly to prevent implicit conversion that can bypass indexes.
Conclusion
The outage was caused by a missing index on a high‑traffic table, highlighting the importance of thorough schema reviews and proactive performance monitoring before migration. Proper table design and early detection of bottlenecks can prevent similar crises.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.