Performance Comparison of Different Batch Insert Strategies in MySQL
This article evaluates single‑threaded bulk‑insert performance in MySQL by testing various batch sizes, analyzes memory usage and transaction merging, and provides practical recommendations for efficiently generating millions of records.
Background
In many projects we need to add large amounts of data, such as generating test data, simple data migration, or statistical reports. Inserting or updating massive data sets in a single operation can take a long time and fail to meet expectations.
This article explores how to quickly generate millions of rows, first by using multithreading and then by optimizing single‑threaded batch inserts to achieve the greatest possible speedup.
Technology Stack
Java 1.8
MyBatis
MySQL 5.7
Preparation
A simple user table is created for testing:
create table zcy_user (
id int auto_increment primary key,
user_name varchar(32) null,
sex int null,
email varchar(50) null,
address varchar(100) null,
create_time timestamp null
);The MyBatis XML mapper is extended with a batch‑insert statement:
insert into t_user(`user_name`,`sex`,`email`,`address`) VALUES
(#{emp.userName}, #{emp.sex}, #{emp.email} ,#{emp.address})Test Timing (Single‑Threaded)
Using a controlled variable method, each test inserts 1,000,000 rows in a single thread and records the elapsed time.
Insert 1 row per batch
1 000 rows take 311.65 seconds, so 1 000 000 rows are estimated at ~31 165 seconds.
Insert 100 rows per batch
100 rows per batch take 373.58 seconds.
Insert 1 000 rows per batch
1 000 rows per batch take 62.08 seconds.
Insert 10 000 rows per batch
10 000 rows per batch take 31.83 seconds.
Insert 50 000 rows per batch
50 000 rows per batch take 29.30 seconds.
Insert 100 000 rows per batch
The operation fails because the packet size exceeds the MySQL limit (5 900 064 KB > 4 194 304 KB).
Comparison Table
Batch Size
Time (s)
Memory
Speed
Speed‑up Ratio
1
31165
Idle
Slow
/
100
373
Good
Slow
83×
1000
62
Good
Medium
502×
10000
31
Medium
Fast
1005×
50000
29
Large
Fast
1074×
100000
/
Very Large
/
/
Extended Knowledge
MySQL Memory Structure
According to the MySQL documentation, the InnoDB storage engine consists of memory storage (Buffer Pool, Log Buffer) and disk storage. Since random disk I/O is slow, InnoDB first reads data into the Buffer Pool and writes back via I/O threads.
Buffer Pool
The Buffer Pool caches frequently accessed data and indexes to avoid disk I/O. It uses an LRU algorithm, but plain LRU can be inefficient for full‑table scans. MySQL improves this by splitting the pool into hot (5/8) and cold (3/8) regions, moving pages with request intervals longer than 1 second to the cold region.
# View hot/cold split time (default 1000ms)
mysql> show variables like 'innodb_old_blocks_time';
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| innodb_old_blocks_time | 1000 |
+------------------------+-------+
1 row in set (0.00 sec)Change Buffer
When a page to be updated is not in memory, InnoDB stores the modification in the Change Buffer. The actual page is loaded later, and the buffered changes are merged, reducing disk reads and improving performance. If the system crashes, Redo Log ensures consistency.
Performance Improvement Analysis
Transaction Merging
Batch inserts combine many VALUES clauses into a single transaction, generating only one Redo Log entry and reducing disk I/O. However, the log file size has an upper limit; exceeding it prevents further gains.
# Single transaction with many VALUES
insert into t_user(`user_name`,`sex`,`email`,`address`) VALUES
(1,1,1,1),(2,2,2,2),...,(n,n,n,n);
# Multiple single‑row statements within one transaction
begin;
insert into t_user(`user_name`,`sex`,`email`,`address`) value(1,1,1,1);
insert into t_user(`user_name`,`sex`,`email`,`address`) value(2,2,2,2);
...
insert into t_user(`user_name`,`sex`,`email`,`address`) value(n,n,n,n);
commit;Ensuring Sequential Data
InnoDB stores data in B+‑tree indexes. Inserting rows with sequential primary keys keeps pages contiguous, enabling sequential I/O. Random keys cause scattered page writes, leading to costly random I/O.
Use Cases
Data report import
Bulk data initialization
Massive data migration (millions of rows)
Log table recording
Conclusion
When projects need to handle millions of rows, performance must be guaranteed. The tests show that merging many inserts into larger batches dramatically reduces execution time. By combining requests into a single transaction, optimizing batch size, and ensuring sequential keys, database load and latency are minimized, resulting in a more stable application.
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.