Databases 6 min read

Lessons Learned from Misusing INSERT INTO SELECT in MySQL Data Migration

A real‑world MySQL case study shows how using INSERT INTO SELECT for large‑scale data migration without proper indexing caused full‑table scans, lock contention, and data loss, highlighting the need for careful query planning and performance testing.

Java Captain
Java Captain
Java Captain
Lessons Learned from Misusing INSERT INTO SELECT in MySQL Data Migration

Background

The company processes millions of rows daily in a single MySQL table without sharding, so a data‑migration task was needed to keep performance acceptable.

Proposed Solutions

Programmatically fetch data, insert into a history table, then delete the original rows.

Use INSERT INTO ... SELECT to let the database handle the whole operation.

The first approach caused an OOM error when loading all data at once, while the second seemed to work in tests and was deployed.

Failure Analysis

First approach (pseudo‑code)

// 1、查询对应需要迁移的数据
List
list = selectData();

// 2、将数据插入历史表
insertData(list);

// 3、删除原表数据
deleteByIds(ids);

This loads the entire dataset into memory, leading to OOM.

Second approach

The team kept only the last 10 days of data (~10k rows) and executed a nightly INSERT INTO SELECT ... WHERE dateTime < (NOW() - INTERVAL 10 DAY) task at 8 PM, assuming the reduced load would avoid OOM.

During the night the task ran without issue, but the next morning the finance team discovered missing payment records. Investigation revealed that after 8 PM the insert‑select began failing intermittently, causing data loss.

Further debugging showed the migration task was the culprit. Stopping the task eliminated the failures.

Root Cause

The INSERT INTO SELECT statement performed a full‑table scan, which, under the default transaction isolation level, locked the target table (a) entirely while locking rows in the source table (b) one by one. This caused lock contention and timeouts, leading to partial failures.

Testing did not expose the issue because the test environment lacked the massive concurrent inserts that occur in production.

Solution

Avoid full‑table scans by adding appropriate indexes on the columns used in the WHERE clause so that the SELECT part can use an index.

Can INSERT INTO SELECT Still Be Used?

Yes, but only with proper indexing and awareness of its locking behavior.

Conclusion

When using INSERT INTO SELECT for large data migrations, ensure that the query is indexed to prevent full‑table scans and lock contention, and always validate performance under realistic production loads.

data migrationperformanceIndexingDatabaseMySQLINSERT INTO SELECT
Java Captain
Written by

Java Captain

Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.