Lessons Learned from Misusing INSERT INTO SELECT in MySQL Data Migration
A real‑world MySQL case study shows how using INSERT INTO SELECT for large‑scale data migration without proper indexing caused full‑table scans, lock contention, and data loss, highlighting the need for careful query planning and performance testing.
Background
The company processes millions of rows daily in a single MySQL table without sharding, so a data‑migration task was needed to keep performance acceptable.
Proposed Solutions
Programmatically fetch data, insert into a history table, then delete the original rows.
Use INSERT INTO ... SELECT to let the database handle the whole operation.
The first approach caused an OOM error when loading all data at once, while the second seemed to work in tests and was deployed.
Failure Analysis
First approach (pseudo‑code)
// 1、查询对应需要迁移的数据
List
list = selectData();
// 2、将数据插入历史表
insertData(list);
// 3、删除原表数据
deleteByIds(ids);This loads the entire dataset into memory, leading to OOM.
Second approach
The team kept only the last 10 days of data (~10k rows) and executed a nightly INSERT INTO SELECT ... WHERE dateTime < (NOW() - INTERVAL 10 DAY) task at 8 PM, assuming the reduced load would avoid OOM.
During the night the task ran without issue, but the next morning the finance team discovered missing payment records. Investigation revealed that after 8 PM the insert‑select began failing intermittently, causing data loss.
Further debugging showed the migration task was the culprit. Stopping the task eliminated the failures.
Root Cause
The INSERT INTO SELECT statement performed a full‑table scan, which, under the default transaction isolation level, locked the target table (a) entirely while locking rows in the source table (b) one by one. This caused lock contention and timeouts, leading to partial failures.
Testing did not expose the issue because the test environment lacked the massive concurrent inserts that occur in production.
Solution
Avoid full‑table scans by adding appropriate indexes on the columns used in the WHERE clause so that the SELECT part can use an index.
Can INSERT INTO SELECT Still Be Used?
Yes, but only with proper indexing and awareness of its locking behavior.
Conclusion
When using INSERT INTO SELECT for large data migrations, ensure that the query is indexed to prevent full‑table scans and lock contention, and always validate performance under realistic production loads.
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.