Improving Million‑Scale Data Insert Efficiency with Spring Boot ThreadPoolTaskExecutor
This article demonstrates how to boost the insertion speed of over two million records by configuring a Spring Boot ThreadPoolTaskExecutor, integrating it with MyBatis‑Plus and PostgreSQL, and measuring multi‑threaded versus single‑threaded performance to determine the optimal thread count.
Purpose : Increase the efficiency of inserting data at a million‑scale level.
Solution : Use ThreadPoolTaskExecutor to perform multi‑threaded batch inserts.
Technology Stack : Spring Boot 2.1.1, MyBatis‑Plus 3.0.6, Swagger 2.5.0, Lombok 1.18.4, PostgreSQL, and ThreadPoolTaskExecutor .
Thread Pool Configuration (application‑dev.properties) :
# 异步线程配置
# 配置核心线程数
async.executor.thread.core_pool_size = 30
# 配置最大线程数
async.executor.thread.max_pool_size = 30
# 配置队列大小
async.executor.thread.queue_capacity = 99988
# 配置线程池中的线程的名称前缀
async.executor.thread.name.prefix = async-importDB-Spring Bean Definition :
@Configuration
@EnableAsync
@Slf4j
public class ExecutorConfig {
@Value("${async.executor.thread.core_pool_size}")
private int corePoolSize;
@Value("${async.executor.thread.max_pool_size}")
private int maxPoolSize;
@Value("${async.executor.thread.queue_capacity}")
private int queueCapacity;
@Value("${async.executor.thread.name.prefix}")
private String namePrefix;
@Bean(name = "asyncServiceExecutor")
public Executor asyncServiceExecutor() {
log.warn("start asyncServiceExecutor");
ThreadPoolTaskExecutor executor = new VisiableThreadPoolTaskExecutor();
executor.setCorePoolSize(corePoolSize);
executor.setMaxPoolSize(maxPoolSize);
executor.setQueueCapacity(queueCapacity);
executor.setThreadNamePrefix(namePrefix);
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
executor.initialize();
return executor;
}
}Asynchronous Service Implementation :
@Service
@Slf4j
public class AsyncServiceImpl implements AsyncService {
@Override
@Async("asyncServiceExecutor")
public void executeAsync(List
logOutputResults, LogOutputResultMapper logOutputResultMapper, CountDownLatch countDownLatch) {
try {
log.warn("start executeAsync");
// asynchronous work
logOutputResultMapper.addLogOutputResultBatch(logOutputResults);
log.warn("end executeAsync");
} finally {
countDownLatch.countDown(); // ensure latch release
}
}
}Multi‑Threaded Batch Insert Method :
@Override
public int testMultiThread() {
List
logOutputResults = getTestData();
// split every 100 records into a sub‑list
List
> lists = ConvertHandler.splitList(logOutputResults, 100);
CountDownLatch countDownLatch = new CountDownLatch(lists.size());
for (List
listSub : lists) {
asyncService.executeAsync(listSub, logOutputResultMapper, countDownLatch);
}
try {
countDownLatch.await(); // wait for all threads
} catch (Exception e) {
log.error("阻塞异常:" + e.getMessage());
}
return logOutputResults.size();
}Test Results :
2000003 records inserted with 30 threads: 1.67 minutes .
Same data inserted with a single thread: 5.75 minutes .
Various thread counts were tested; performance does not improve indefinitely with more threads.
The practical rule of thumb observed is CPU cores * 2 + 2 threads for optimal throughput.
Conclusion : Multi‑threaded insertion using a properly configured ThreadPoolTaskExecutor dramatically reduces processing time for large data sets, but the optimal thread count depends on the hardware and should follow the “cores × 2 + 2” guideline.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.