Optimizing XML‑to‑MySQL Data Import: Reducing Execution Time from 300 s to 4 s
This article describes how to dramatically speed up the import of 60,000+ XML records into MySQL by analyzing the original environment, measuring baseline performance, and applying a series of backend optimizations—including MySQL batch mode, rewriteBatchedStatements, multithreaded asynchronous writes with Disruptor, and XML parsing improvements—ultimately cutting the total runtime from five minutes to just a few seconds.
The author, a senior architect, encountered a common data‑import scenario: loading more than 60,000 rows from an XML file into MySQL. The initial implementation, a simple procedural Java routine, took about 300 seconds and consumed 656 MB of memory.
Environment : Java 21 on an 8‑core i9 laptop (2.3 GHz, 16 GB RAM) communicating with MySQL 8.0 running inside a VirtualBox VM (4 CPU, 4 GB RAM). Each XML record read took 0.08 s; each MySQL insert took 0.5 s.
Baseline code (simplified):
void importData() {
Document doc = parseXmlFile();
List
products = extractProducts(doc);
for (Product p : products) {
// insert into MySQL one by one
}
}To achieve a dramatic speedup, the author explored several optimization directions.
1. MySQL Batch Processing
Enabling JDBC batch mode and the rewriteBatchedStatements=true connection parameter reduces the number of round‑trips to the server. The core batch code is:
Connection connection = ...;
PreparedStatement stmt = connection.prepareStatement(sql);
connection.setAutoCommit(false);
for (int i = 0; i < batchSize; i++) {
stmt.setString(...);
stmt.addBatch();
}
stmt.executeBatch();
connection.commit();Key JDBC methods used are addBatch() , executeBatch() and clearBatch() . After enabling batch processing, the import time dropped to about 9 seconds with similar memory usage.
2. Asynchronous Writing with Disruptor
To further reduce latency, the author introduced a producer‑consumer model using LMAX Disruptor. XML parsing produces ProductEvent objects that are placed into a ring buffer; multiple consumer handlers write batches to MySQL.
var disruptor = new Disruptor<>(
ProductEvent::new,
16384,
DaemonThreadFactory.INSTANCE,
ProducerType.SINGLE,
new BusySpinWaitStrategy());
var consumers = new SaveDbHandler[4];
for (int i = 0; i < consumers.length; i++) {
consumers[i] = new SaveDbHandler(i, consumers.length, shutdownLatch);
}
disruptor.handleEventsWith(new SimpleBatchRewindStrategy(), consumers)
.then(new ClearingEventHandler());
RingBuffer
ringBuffer = disruptor.start();
for (Iterator
it = document.getRootElement().elementIterator(); it.hasNext(); ) {
Element element = it.next();
if (!StringUtils.hasText(element.elementTextTrim("id"))) continue;
Product product = ObjectMapper.buildProduct(element);
ringBuffer.publishEvent((event, seq, buf) -> event.setProduct(product));
}When combined with batch mode, the total runtime fell to roughly 4.5 seconds, at the cost of higher peak memory (≈1 GB) due to multiple threads.
3. Further Optimizations
Using SAX (event‑driven) XML parsing to lower object allocation when handling very large files.
Increasing MySQL buffers ( innodb_log_buffer_size , innodb_buffer_pool_size ) and adjusting innodb_flush_log_at_trx_commit for better write throughput.
Adjusting max_allowed_packet to accommodate larger batch sizes.
Exploring larger Disruptor batchSize (e.g., 16384) to push runtime down to ~3.5 s, while monitoring ring‑buffer memory impact.
All source code is available on GitHub: https://github.com/xioshe/xml-to-mysql .
In summary, by focusing on MySQL write‑path optimizations—batch statements, rewrite flag, and asynchronous multithreaded consumption—the import duration was reduced from 300 seconds to under 4 seconds, demonstrating a practical performance‑engineering workflow for backend data‑loading tasks.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.