Databases 7 min read

Using pt-archiver for Batch Deletion of Large MySQL Tables

The article explains why massive MySQL tables need periodic cleanup, compares risky drop/truncate approaches, and provides a step‑by‑step guide to safely delete historical data in batches using pt‑archiver with proper parameters, session handling, and post‑deletion maintenance.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Using pt-archiver for Batch Deletion of Large MySQL Tables

Introduction

As business volume grows, MySQL tables can quickly reach tens of millions or even billions of rows, causing slow joins and increased response times that degrade user experience. Frequently, only recent log or record data (2‑3 months) needs to be retained, so large tables must be cleaned without disrupting normal database operations.

Deletion Options and Risks

Deleting massive data must be done in small batches; otherwise the operation can block the table and generate oversized binlogs that may crash replicas. Simple approaches such as dumping needed rows with mysqldump then dropping or truncating the table are risky because they render the table unavailable for a period and can cause I/O spikes, especially if the table is still accessed frequently.

1. mysqldump the required data, then DROP TABLE and import. 2. mysqldump the required data, then TRUNCATE TABLE and import. Both methods cause temporary unavailability and high I/O; using them on hot tables is akin to “lighting a lamp in a cesspool”.

Using pt-archiver for Incremental Deletion

Parameter Overview

pt-archiver --help
--progress   Print progress every N rows
--limit       Limit rows returned by SELECT
--sleep       Sleep time between SELECT statements
--txn-size    Number of rows per transaction
--bulk-delete Use a single DELETE for each chunk (implies --commit-each)
--dry-run     Show queries without executing

Deletion Procedure (Three Steps)

1. Print the query to verify the selection criteria.

2. Enable session persistence with screen to avoid interruption if the SSH session drops.

3. Execute the deletion using pt-archiver with appropriate options.

# Print query
$ pt-archiver --source h=10.186.65.19,P=3306,u=root,p='123',D=sbtest,t=sbtest1 \
    --purge --charset=utf8mb4 --where "id <= 400000" \
    --progress=200 --limit=200 --sleep=1 --txn-size=200 \
    --statistics --dry-run

# Open screen session
screen -S archiver

# Run actual deletion
$ pt-archiver --source h=10.186.65.19,P=3306,u=root,p='123',D=sbtest,t=sbtest1 \
    --purge --charset=utf8mb4 --where "id <= 400000" \
    --progress=200 --limit=200 --sleep=1 --txn-size=200 --statistics

The tool prints progress and statistics, showing the number of rows deleted, time spent, and the proportion of time spent sleeping versus actual deletion.

Post‑Deletion Cleanup

After deleting rows, MySQL does not immediately free disk space. During low‑traffic periods you may run OPTIMIZE TABLE to reclaim space, though it is optional and may require a restart in extreme cases.

Reference: MySQL Optimize Table Documentation

Keywords: #delete# #pt-archiver# #data‑cleanup#

SQLmysqlbatch deletionData CleanupDatabase Maintenancept-archiver
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.