Speed Up XML‑to‑MySQL Imports: Reduce 300 s to 4 s with JDBC Batch & Async
This article walks through optimizing a Java‑based XML‑to‑MySQL import, showing how to cut processing time from 300 seconds to just 4 seconds by enabling JDBC batch writes, using rewriteBatchedStatements, applying async writes with Disruptor, and tuning MySQL settings.
The author faced a common data‑import scenario: loading over 60,000 records from an XML file into MySQL, which originally took 300 seconds.
Execution Environment
Java code runs on a laptop (8‑core i9, 16 GB RAM) while MySQL runs in a VirtualBox VM (4 CPU, 4 GB RAM). The JDK version is 21 and MySQL version is 8.0.
Baseline Implementation Performance
A straightforward procedural import reads the XML, extracts a list of
Productobjects, and inserts each row individually.
<code>void importData() {
Document doc = parse XML file;
List<Product> products = extract data from doc;
for (Product p : products) {
insert into MySQL;
}
}
</code>This baseline takes about 300 seconds and consumes around 656 MB of memory.
Where to Start Optimizing?
The dominant cost is MySQL write latency (≈298 seconds). Two main strategies are suggested: write aggregation (batching) and asynchronous writes.
Enable MySQL Batch Processing
JDBC provides a Batch API that can group many INSERT statements into a single request. The key steps are
addBatch(),
executeBatch(), and
clearBatch(). Crucially, the connection must be created with
rewriteBatchedStatements=trueso the driver actually merges the statements.
<code>Connection connection = ...;
PreparedStatement statement = connection.prepareStatement(sql);
connection.setAutoCommit(false);
for (int i = 0; i < batchSize; i++) {
statement.setString(...);
statement.addBatch();
}
statement.executeBatch();
statement.clearBatch();
connection.commit();
</code>After enabling batch mode, the import time drops to 9 seconds, and with a tuned batch size the overall result is 12 seconds while memory stays around 673 MB.
Why Enable rewriteBatchedStatements ?
Without this flag the driver still sends each INSERT individually, negating the benefit of the Batch API. Setting the flag merges multiple INSERTs into a single multi‑value statement, dramatically improving throughput.
Handle Duplicate Imports
When re‑importing data, you can either truncate the target table, or use upsert logic such as
INSERT ... ON DUPLICATE KEY UPDATEor
REPLACE INTOto make the operation idempotent.
<code>INSERT INTO t1(a,b,c) VALUES (1,2,3),(4,5,6)
ON DUPLICATE KEY UPDATE a=VALUES(a), b=VALUES(b), c=VALUES(c);
</code>Enable Multi‑Threaded Async Writes
Beyond batch processing, the author uses the LMAX Disruptor library to decouple XML parsing from database writes. Parsed
Productobjects are published to a ring buffer, and four consumer threads consume the events and perform batched inserts.
<code>var disruptor = new Disruptor<>(ProductEvent::new, 16384,
DaemonThreadFactory.INSTANCE, ProducerType.SINGLE,
new BusySpinWaitStrategy());
// configure consumers and start
</code>This asynchronous approach reduces total processing time to about 4.5 seconds (≈60 % faster than batch‑only) while increasing peak memory usage due to multiple threads.
Rewindable Event Handling
If a batch fails, the consumer can throw a
RewindableExceptionto have Disruptor replay the same batch after rolling back the transaction.
Further Optimization Directions
XML Parsing
For very large XML files, switch from DOM parsing to SAX (event‑driven) parsing to lower memory pressure.
Disruptor Memory Usage
Increasing the batch size (e.g., to 16384) can further reduce latency, but the ring buffer size grows proportionally, so memory consumption must be monitored.
MySQL Write Performance
Additional tuning includes enlarging
log_buffer, increasing the InnoDB buffer pool, and adjusting
innodb_flush_log_at_trx_committo control log flushing frequency.
Source Code
https://github.com/ginnsx/xml-to-mysql
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.