What’s the Optimal Batch Size for MySQL Inserts? A Deep Performance Test
This article investigates how many rows should be inserted per batch in MySQL by measuring the impact of packet size limits, buffer pool usage, insert buffers, transaction handling and index structures, and it provides practical recommendations based on tests with millions of rows.
Introduction
When inserting large volumes of data—such as millions of rows—into MySQL, batch insertion is the preferred technique, but the optimal batch size is unclear. The article explores how batch size affects performance when inserting into a temporary table.
Preparation
The author initially used a loop that inserted 1000 rows per batch, mirroring other projects, and decided to experiment with different batch sizes.
Checking MySQL version
mysql> select version();
+------------+
| version() |
+------------+
| 5.6.34-log |
+------------+
1 row in set (0.00 sec)Table schema
The temporary table contains four columns: three int(10) fields and one varchar(10) field. The total size of a single row is calculated as 4+4+4+40 = 52 bytes (assuming UTF‑8‑mb4 characters for the varchar).
Time distribution of a single insert
Link overhead (30%)
Send query (20%)
Parse query (20%)
Insert operation (10% * row count)
Insert index (10% * index count)
Close connection (10%)The dominant cost is connection and parsing, not the actual INSERT execution, which motivates larger batch sizes to reduce round‑trip overhead.
SQL size limits
The max_allowed_packet variable controls the maximum packet size. By default MySQL 5.7 allows up to 1 MiB for a client query (16 MiB for the client library, 4 MiB for the server). The current setting on the test server is 33554432 bytes (32 MiB).
Maximum rows per batch
Using the 1 MiB limit and the 52‑byte row size, the theoretical maximum rows per batch is (1024*1024)/52 ≈ 20165. To stay safely below the limit, the author caps the batch at 20000 rows, which yields up to 20000 * 32 = 640000 rows for a 32 MiB packet.
Performance tests
11 W rows (≈110 000)
110000 rows, batch 10 → 2.361 s
110000 rows, batch 600 → 0.523 s
110000 rows, batch 1000 → 0.429 s
110000 rows, batch 20000 → 0.426 s
110000 rows, batch 80000 → 0.352 sIncreasing the batch size reduces total time, but the improvement plateaus around 20 000–80 000 rows.
24 W rows (≈241 397)
241397 rows, batch 10 → 4.445 s
241397 rows, batch 600 → 1.187 s
241397 rows, batch 1000→ 1.130 s
241397 rows, batch 20000→ 0.933 s
241397 rows, batch 80000→ 0.753 sHere a batch of 24 W rows (≈240 000) shows the best performance, suggesting that the earlier 20 000 limit is not yet reached.
42 W rows (≈418 859)
418859 rows, batch 1000 → 2.216 s
418859 rows, batch 80000→ 1.777 s
418859 rows, batch 1.6 W → 1.523 s
418859 rows, batch 2 W → 1.432 s
418859 rows, batch 3 W → 1.362 s
418859 rows, batch 4 W → 1.764 sPerformance improves up to about 30 W rows per batch, after which it degrades, likely because the packet size approaches the max_allowed_packet limit and memory consumption rises.
Other factors affecting insert speed
Buffer pool
If the InnoDB buffer pool has less than 25 % free space, inserts can fail with DB_LOCK_TABLE_FULL. The test server shows innodb_buffer_pool_size = 128 MiB, allowing up to 64 MiB for the insert buffer.
Insert buffer (IBUF)
InnoDB uses an insert buffer to batch non‑clustered index updates, reducing random I/O. When the insert buffer consumes a large portion of the buffer pool, it can impact other operations.
Transactions
Wrapping many inserts in a single transaction reduces per‑statement overhead. Example:
START TRANSACTION;
INSERT INTO `insert_table` (`datetime`,`uid`,`content`,`type`) VALUES ('0','userid_0','content_0',0);
INSERT INTO `insert_table` ...
COMMIT;The transaction size should stay below innodb_log_buffer_size (≈64 MiB) to avoid flushing delays.
Configuration tuning
Increasing innodb_buffer_pool_size can improve read/write throughput if memory is available.
Adjusting max_allowed_packet upward allows larger batches but does not guarantee optimal performance.
Indexes
Multiple indexes increase insert cost because each row must update the index structures. Inserting in primary‑key order (append‑only) is fastest; inserting into the middle of a B‑tree forces page splits and more disk I/O.
Conclusion
The author recommends using a batch size roughly half of max_allowed_packet (e.g., 30 W–32 W rows for a 32 MiB packet) as a practical compromise between speed and memory usage. However, the true optimum depends on server configuration, buffer pool size, transaction limits, and index design, so testing with real data is essential.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
