Backend Development 5 min read

Improving Million‑Scale Data Insert Efficiency with Spring Boot ThreadPoolTaskExecutor

This article demonstrates how to boost the insertion speed of over two million records by configuring a Spring Boot ThreadPoolTaskExecutor, integrating it with MyBatis‑Plus and PostgreSQL, and measuring multi‑threaded versus single‑threaded performance to determine the optimal thread count.

Architecture Digest

Mar 8, 2024

Improving Million‑Scale Data Insert Efficiency with Spring Boot ThreadPoolTaskExecutor

Purpose : Increase the efficiency of inserting data at a million‑scale level.

Solution : Use ThreadPoolTaskExecutor to perform multi‑threaded batch inserts.

Technology Stack : Spring Boot 2.1.1, MyBatis‑Plus 3.0.6, Swagger 2.5.0, Lombok 1.18.4, PostgreSQL, and ThreadPoolTaskExecutor.

Thread Pool Configuration (application‑dev.properties) :

# 异步线程配置
# 配置核心线程数
async.executor.thread.core_pool_size = 30
# 配置最大线程数
async.executor.thread.max_pool_size = 30
# 配置队列大小
async.executor.thread.queue_capacity = 99988
# 配置线程池中的线程的名称前缀
async.executor.thread.name.prefix = async-importDB-

Spring Bean Definition :

@Configuration
@EnableAsync
@Slf4j
public class ExecutorConfig {
    @Value("${async.executor.thread.core_pool_size}")
    private int corePoolSize;
    @Value("${async.executor.thread.max_pool_size}")
    private int maxPoolSize;
    @Value("${async.executor.thread.queue_capacity}")
    private int queueCapacity;
    @Value("${async.executor.thread.name.prefix}")
    private String namePrefix;

    @Bean(name = "asyncServiceExecutor")
    public Executor asyncServiceExecutor() {
        log.warn("start asyncServiceExecutor");
        ThreadPoolTaskExecutor executor = new VisiableThreadPoolTaskExecutor();
        executor.setCorePoolSize(corePoolSize);
        executor.setMaxPoolSize(maxPoolSize);
        executor.setQueueCapacity(queueCapacity);
        executor.setThreadNamePrefix(namePrefix);
        executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
        executor.initialize();
        return executor;
    }
}

Asynchronous Service Implementation :

@Service
@Slf4j
public class AsyncServiceImpl implements AsyncService {
    @Override
    @Async("asyncServiceExecutor")
    public void executeAsync(List<LogOutputResult> logOutputResults, LogOutputResultMapper logOutputResultMapper, CountDownLatch countDownLatch) {
        try {
            log.warn("start executeAsync");
            // asynchronous work
            logOutputResultMapper.addLogOutputResultBatch(logOutputResults);
            log.warn("end executeAsync");
        } finally {
            countDownLatch.countDown(); // ensure latch release
        }
    }
}

Multi‑Threaded Batch Insert Method :

@Override
public int testMultiThread() {
    List<LogOutputResult> logOutputResults = getTestData();
    // split every 100 records into a sub‑list
    List<List<LogOutputResult>> lists = ConvertHandler.splitList(logOutputResults, 100);
    CountDownLatch countDownLatch = new CountDownLatch(lists.size());
    for (List<LogOutputResult> listSub : lists) {
        asyncService.executeAsync(listSub, logOutputResultMapper, countDownLatch);
    }
    try {
        countDownLatch.await(); // wait for all threads
    } catch (Exception e) {
        log.error("阻塞异常:" + e.getMessage());
    }
    return logOutputResults.size();
}

Test Results :

2000003 records inserted with 30 threads: 1.67 minutes .

Same data inserted with a single thread: 5.75 minutes .

Various thread counts were tested; performance does not improve indefinitely with more threads.

The practical rule of thumb observed is CPU cores * 2 + 2 threads for optimal throughput.

Conclusion : Multi‑threaded insertion using a properly configured ThreadPoolTaskExecutor dramatically reduces processing time for large data sets, but the optimal thread count depends on the hardware and should follow the “cores × 2 + 2” guideline.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Spring Boot multithreading PostgreSQL Mybatis-Plus Batch Insert ThreadPoolTaskExecutor

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.