Optimizing XML‑to‑MySQL Bulk Import: Reducing Execution Time from 300 s to 4 s with JDBC Batch and Disruptor
This article details how a Java‑based XML‑to‑MySQL import of over 60,000 records was accelerated from 300 seconds to just 4 seconds by enabling JDBC batch processing, configuring rewriteBatchedStatements, and employing a multithreaded Disruptor pipeline for asynchronous writes, while also discussing further tuning options.
The author faced a common data‑import scenario: loading more than 60,000 rows from XML (and similar XLSX) files into a MySQL database, which originally took about 300 seconds.
Test environment: Java 21 running on an i9‑8‑core notebook (2.3 GHz, 16 GB RAM) connected to a MySQL 8.0 instance inside a VirtualBox VM (4 CPU, 4 GB RAM). Single‑row read cost 0.08 s, single‑row insert cost 0.5 s.
The baseline implementation used a simple procedural loop that parsed the XML, built a List<Product> , and inserted each product one by one, resulting in the 300‑second runtime and a peak memory usage of 656 MB.
Since the dominant cost was MySQL writes, the author first enabled MySQL batch processing via JDBC. The key is to disable auto‑commit and set rewriteBatchedStatements=true in the connection URL so that the driver can merge multiple INSERTs into a single statement.
Connection connection = ...;
PreparedStatement statement = connection.prepareStatement(sql);
connection.setAutoCommit(false);
for (int i = 0; i < batchSize; i++) {
statement.setString(...);
statement.addBatch();
}
statement.executeBatch();
connection.commit();With batching turned on, write time dropped to 9 seconds and memory remained stable; after further tuning the overall time settled at about 12 seconds with 673 MB memory.
To push performance further, the author introduced asynchronous writes using the LMAX Disruptor library. XML parsing feeds a ring buffer; four consumer threads each handle a partition of the events and perform batched inserts.
var disruptor = new Disruptor<>(
ProductEvent::new,
16384,
DaemonThreadFactory.INSTANCE,
ProducerType.SINGLE,
new BusySpinWaitStrategy());
var consumers = new SaveDbHandler[4];
for (int i = 0; i < consumers.length; i++) {
consumers[i] = new SaveDbHandler(i, consumers.length, shutdownLatch);
}
disruptor.handleEventsWith(new SimpleBatchRewindStrategy(), consumers)
.then(new ClearingEventHandler());
var ringBuffer = disruptor.start();
// publish events …The consumer’s onEvent method adds statements to the batch and executes the batch at the end of each batch window, with proper rollback on errors.
@Override
public void onEvent(ProductEvent event, long sequence, boolean endOfBatch) throws Exception {
if (sequence % numberOfConsumers != ordinal) return;
try {
statement.addBatch();
if (endOfBatch) {
statement.executeBatch();
statement.clearBatch();
connection.commit();
}
} catch (SQLException e) {
log.error("[{}] handler error, sequence: {}, endOfBatch:{}", ordinal, sequence, endOfBatch, e);
if (connection != null) connection.rollback();
}
}After adding the asynchronous pipeline, total import time fell to roughly 4.5 seconds (about a 60 % reduction from the batch‑only result) with peak memory around 1 GB; further tuning of batch size and ring‑buffer parameters finally achieved a stable 4‑second runtime.
Additional optimisation ideas include using a SAX parser for lower‑memory XML processing, increasing Disruptor batchSize (e.g., to 16384) for higher throughput, and tuning MySQL settings such as innodb_log_buffer_size , innodb_buffer_pool_size , and innodb_flush_log_at_trx_commit . The complete source code is available on GitHub: https://github.com/xioshe/xml-to-mysql .
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.