How I Rescued a Production MySQL Database After a Fatal rm -rf Accident
After a junior engineer mistakenly ran an unguarded rm -rf command that wiped an entire production server—including MySQL and Tomcat—I documented the step‑by‑step recovery using ext3grep, extundelete, and MySQL binlog, highlighting the lessons learned for future operations.
Incident Background
A junior staff member was asked to install Oracle on a production server. While uninstalling, she executed
rm -rf $ORACLE_BASEwithout the variable set, which became
rm -rf /*and deleted the entire filesystem, including MySQL, Tomcat, and other services.
Initial Response and Backup Issues
The team discovered the loss, mounted the disk on another server, and found only a 1 KB offline backup containing a few mysqldump comments. No recent backups existed, and the production system had been running for months.
Rescue with ext3grep
Searching for a recovery tool led to ext3grep , which can restore files deleted on ext3 partitions. After unmounting the volume to prevent further writes, the following command listed all deleted files:
<code>ext3grep /dev/vgdata/LogVol00 --dump -names</code>Because the tool restores all files at once, the command used was:
<code>ext3grep /dev/vgdata/LogVol00 --restore-all</code>Disk space ran out, so only a subset of files could be recovered. A script filtered MySQL‑related paths and restored them one by one:
<code>while read LINE; do
echo "begin to restore file $LINE"
ext3grep /dev/vgdata/LogVol00 --restore-file $LINE
if [ $? != 0 ]; then
echo "restore failed, exit"
# exit 1
fi
done < ./mysqltbname.txt</code>This recovered about 40 MySQL files, but many tables remained missing.
Attempt with extundelete
Another tool, extundelete , was tried with the command:
<code>extundelete /dev/vgdata/LogVol00 --restore-directory var/lib/mysql/aqsh</code>Unfortunately, the files were too damaged and the restore failed.
Binlog Recovery
Remembering that MySQL binlogs were enabled, the team located three binlog files (e.g.,
mysql-bin.000010). Restoring the last binlog succeeded:
<code>mysqlbinlog /usr/mysql-bin.000010 | mysql -uroot -p</code>After applying the binlog, the application came back online and most of the lost data was recovered.
Postmortem and Lessons Learned
Never perform production changes without clear communication and proper planning.
Ensure automated backups are verified regularly; a 1 KB backup is useless.
Implement monitoring and alerting for critical services to detect failures early.
Avoid using the root account for routine operations; employ least‑privilege users.
The incident highlighted the importance of disciplined change management, reliable backup strategies, and rapid incident response.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.