Optimizing XML-to-MySQL Import: Reducing Execution Time from 300 s to 4 s with Batch Processing and Multithreading
This article details how to dramatically speed up importing over 60,000 XML records into MySQL by profiling the original 300‑second implementation, enabling JDBC batch writes, configuring MySQL rewriteBatchedStatements, and applying a Disruptor‑based multithreaded pipeline, ultimately achieving sub‑5‑second runtimes with modest memory usage.
Recently I faced a scenario where more than 60,000 rows needed to be imported from an XML file into MySQL; the straightforward procedural implementation took about 300 seconds and consumed 656 MB of memory.
Execution environment : Java 21 runs on a laptop (8‑core i9, 16 GB RAM) while MySQL 8.0 runs inside a VirtualBox VM with 4 CPU cores and 4 GB RAM. Each XML record read costs ~0.08 s, and each MySQL insert costs ~0.5 s.
Baseline implementation uses a simple loop that parses the XML, builds a Product list, and inserts each row individually. This approach is easy to understand but extremely slow.
Optimization direction focuses on MySQL write performance, which accounts for the majority of the total time. Two main techniques are explored: write aggregation (batching) and asynchronous writing.
Enabling MySQL batch processing involves turning on the JDBC driver option rewriteBatchedStatements=true and using the Batch API. Example code:
Connection connection = ...;
// PreparedStatement caches the SQL to avoid recompilation
PreparedStatement statement = connection.prepareStatement(sql);
connection.setAutoCommit(false); // commit per batch
for (int i = 0; i < batchSize; i++) {
statement.setString(...);
// ... set other parameters
statement.addBatch();
}
statement.executeBatch();
statement.clearBatch();
connection.commit();The three essential JDBC methods are addBatch() , executeBatch() , and clearBatch() . With batch processing enabled, the import time dropped to 9 seconds.
Multithreaded writing with Disruptor further reduces latency by decoupling XML parsing from database writes. The XML parser pushes Product objects into a Disruptor ring buffer, and multiple consumer threads consume and batch‑write them:
var disruptor = new Disruptor<>(ProductEvent::new, 16384,
DaemonThreadFactory.INSTANCE, ProducerType.SINGLE,
new BusySpinWaitStrategy());
var consumers = new SaveDbHandler[4];
for (int i = 0; i < consumers.length; i++) {
consumers[i] = new SaveDbHandler(i, consumers.length, shutdownLatch);
}
disruptor.handleEventsWith(new SimpleBatchRewindStrategy(), consumers)
.then(new ClearingEventHandler());
var ringBuffer = disruptor.start();
for (var it = document.getRootElement().elementIterator(); it.hasNext(); ) {
var element = it.next();
if (!StringUtils.hasText(element.elementTextTrim("id"))) continue;
var product = ObjectMapper.buildProduct(element);
ringBuffer.publishEvent((event, seq, buf) -> event.setProduct(product));
}Each consumer processes a subset of events based on its ordinal, ensuring no duplicate work. This approach brings the total runtime down to about 4 seconds, with memory usage around 1 GB.
Further tuning includes increasing the MySQL innodb_log_buffer_size , innodb_buffer_pool_size , and adjusting innodb_flush_log_at_trx_commit , as well as using INSERT … ON DUPLICATE KEY UPDATE or REPLACE INTO for idempotent imports.
Result : After applying batch writes, enabling rewriteBatchedStatements, and adding a Disruptor‑based multithreaded pipeline, the import time fell from 300 seconds to roughly 4 seconds while keeping memory consumption under 1.5 GB, demonstrating a practical performance‑boosting workflow for large‑scale data ingestion.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.