Backend Development 35 min read

Optimizing Million-Row Excel Export with EasyExcel: Design, Implementation, and Performance Tuning

This article presents a comprehensive guide to efficiently exporting millions of rows to Excel using EasyExcel, covering performance bottlenecks, resource consumption, asynchronous processing, thread‑pool design, batch querying, streaming writes, and practical code examples to achieve fast, memory‑friendly report generation.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Optimizing Million-Row Excel Export with EasyExcel: Design, Implementation, and Performance Tuning

In the previous two articles the core EasyExcel APIs and basic import/export scenarios were introduced, revealing that exporting massive data sets directly can exhaust memory and deliver poor performance.

During a large‑scale promotion, about 200W orders needed to be exported for reconciliation, but the naive export caused the service to freeze and memory to skyrocket.

To avoid memory overflow and long‑lasting loading spinners, the solution must be addressed at both design and code levels.

1. Ordinary Export of Million‑Level Reports

Hardware used for testing:

CPU: i7-12700H (14 cores, 20 threads)

Memory: Dual‑channel DDR5 4800MHz

Disk: Samsung SSD RAID0

Environment: JDK1.8, MySQL8.0.13

A stored procedure inserts 100W rows into the panda table:

-- 创建一个插入100w数据的存储过程
DELIMITER //
DROP PROCEDURE IF EXISTS batch_insert_1m_panda;
CREATE PROCEDURE batch_insert_1m_panda()
BEGIN
    DECLARE i INT DEFAULT 1;
    START TRANSACTION;
    WHILE i <= 1000000 DO
        insert into panda(id, name, nickname, unique_code, sex, height, birthday, pic, level, motto, address, create_time) values
        (i, CONCAT('竹子',i,'号'), CONCAT('小竹',i,'号'), CONCAT('P', i), 0, '178.88', '2018-08-08', NULL, '高级', CONCAT('报数: ', i), CONCAT('地球村',i,'号'), now());
        SET i = i + 1;
    END WHILE;
    COMMIT;
END //
DELIMITER;
CALL batch_insert_1m_panda();

Export code (simplified):

@Data
public class Panda1mExportVO implements Serializable {
    private static final long serialVersionUID = 1L;
    @ExcelProperty("ID")
    private Long id;
    @ExcelProperty("姓名")
    private String name;
    @ExcelProperty("昵称")
    private String nickname;
    @ExcelProperty("编码")
    private String uniqueCode;
    @ExcelProperty("性别")
    private Integer sex;
    @ExcelProperty("身高")
    private BigDecimal height;
    @ExcelProperty("出生日期")
    @DateTimeFormat("yyyy-MM-dd")
    private Date birthday;
    @ExcelProperty("等级")
    private String level;
    @ExcelProperty("座右铭")
    private String motto;
    @ExcelProperty("所在地址")
    private String address;
}

@Override
public void export1mPandaExcel(HttpServletResponse response) {
    List
pandas = baseMapper.select1mPandas();
    String fileName = "百万级熊猫数据-" + System.currentTimeMillis();
    try {
        ExcelUtil.exportExcel(Panda1mExportVO.class, pandas, fileName, ExcelTypeEnum.XLSX, response);
    } catch (IOException e) {
        log.error("百万级熊猫数据导出出错,{}:{}", e, e.getMessage());
        throw new BusinessException("数据导出失败,请稍后再试!");
    }
}

Running this on a local machine yields ~51 s for the first (cold) request and ~31 s for the second, producing a 49.28 MB file.

On a 2c4g cloud database the same procedure took 2781.26s , and the query timed out when called from the service.

The main issues are:

Response time can reach dozens of minutes, causing UI spinners and gateway time‑outs.

Memory usage spikes to >2 GB, leading to OOM in production.

2. Big‑Data‑Level Processing Solutions

Key ideas:

Improve user experience by returning immediately and delivering the file asynchronously via callbacks or a polling mechanism.

Control resource consumption by batch querying, streaming writes, and limiting concurrent export tasks.

2.1 Asynchronous Callback Flow

When the user clicks export, the backend creates a task record, pushes the task to a thread pool or MQ, and returns the task ID. The front‑end shows a progress dialog and periodically polls the task status; once completed, the user can download the file from object storage.

2.1.1 Excel Task Table

CREATE TABLE `excel_task` (
  `task_id` bigint NOT NULL AUTO_INCREMENT COMMENT '任务ID',
  `task_type` int NOT NULL COMMENT '任务类型,0:导出,1:导入',
  `handle_status` tinyint NOT NULL DEFAULT '0' COMMENT '处理状态,0:待处理,1:处理中,2:成功,3:失败',
  `excel_url` varchar(255) DEFAULT NULL COMMENT 'excel链接',
  `trace_id` varchar(255) DEFAULT NULL COMMENT '链路ID',
  `request_params` varchar(2048) DEFAULT NULL COMMENT '请求参数',
  `exception_type` varchar(255) DEFAULT NULL COMMENT '异常类型',
  `error_msg` varchar(2048) DEFAULT NULL COMMENT '异常描述',
  `create_time` datetime DEFAULT NULL COMMENT '创建时间',
  `update_time` datetime DEFAULT NULL COMMENT '修改时间',
  PRIMARY KEY (`task_id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci ROW_FORMAT=DYNAMIC COMMENT='报表任务表';

2.1.2 Enums

@Getter
@AllArgsConstructor
public enum ExcelTaskType {
    EXPORT(0, "导出"),
    IMPORT(1, "导入");
    private final Integer code;
    private final String desc;
}

@Getter
@AllArgsConstructor
public enum TaskHandleStatus {
    WAIT_HANDLE(0, "待处理"),
    IN_PROGRESS(1, "处理中"),
    SUCCEED(2, "处理成功"),
    FAILED(3, "处理失败"),
    WAIT_TO_RESTORE(4, "等待恢复");
    private final Integer code;
    private final String desc;
}

2.2 Optimizing Resource Usage

Instead of loading all rows at once, query in batches (e.g., 2 000 rows) and write them using EasyExcel's streaming API. This reduces peak heap to ~1.35 GB and can be lowered further by adjusting batch size and concurrency.

2.3 Detailed Implementation

2.3.1 Custom Thread Pools

public class TaskThreadPool {
    public static final int concurrentRate = 3;
    private static final int ASYNC_CORE_THREADS = 3, CONCURRENT_CORE_THREADS = ASYNC_CORE_THREADS * concurrentRate;
    private static final int ASYNC_MAX_THREADS = ASYNC_CORE_THREADS + 1, CONCURRENT_MAX_THREADS = ASYNC_MAX_THREADS * concurrentRate;
    private static final int ASYNC_QUEUE_SIZE = 2000, CONCURRENT_QUEUE_SIZE = 20000;
    public static final String ASYNC_THREAD_PREFIX = "excel-async-pool-", CONCURRENT_THREAD_PREFIX = "excel-concurrent-pool-";
    private static final int KEEP_ALIVE_SECONDS = 60 * 3;
    public static final ThreadPoolExecutor.AbortPolicy ASYNC_REJECTED_HANDLER = new ThreadPoolExecutor.AbortPolicy();
    public static final ThreadPoolExecutor.CallerRunsPolicy CONCURRENT_REJECTED_HANDLER = new ThreadPoolExecutor.CallerRunsPolicy();
    private volatile static ThreadPoolTaskExecutor asyncThreadPool, concurrentThreadPool;
    // DCL getters omitted for brevity
}

The async pool handles the overall export task, while the concurrent pool processes batch queries in parallel.

2.3.2 Export Service Logic

@PostMapping("/export/v6")
public ServerResponse
exportExcelV6() {
    return ServerResponse.success(pandaService.export1mPandaExcelV2());
}

public Long export1mPandaExcelV2() {
    ExcelTask excelTask = new ExcelTask();
    excelTask.setTaskType(ExcelTaskType.EXPORT.getCode());
    excelTask.setHandleStatus(TaskHandleStatus.WAIT_HANDLE.getCode());
    excelTask.setCreateTime(new Date());
    excelTaskService.save(excelTask);
    Long taskId = excelTask.getTaskId();
    ThreadPoolTaskExecutor asyncPool = TaskThreadPool.getAsyncThreadPool();
    try {
        asyncPool.submit(() -> handleExportTask(taskId));
    } catch (RejectedExecutionException e) {
        // mark as waiting‑to‑restore
        ExcelTask edit = new ExcelTask();
        edit.setTaskId(taskId);
        edit.setHandleStatus(TaskHandleStatus.WAIT_TO_RESTORE.getCode());
        edit.setExceptionType("异步线程池任务已满");
        edit.setErrorMsg("等待重新载入线程池被调度!");
        edit.setUpdateTime(new Date());
        excelTaskService.updateById(edit);
    }
    return taskId;
}

private void handleExportTask(Long taskId) {
    long start = System.currentTimeMillis();
    excelTaskService.updateStatus(taskId, TaskHandleStatus.IN_PROGRESS);
    ExcelTask edit = new ExcelTask();
    edit.setTaskId(taskId);
    int totalRows = baseMapper.selectTotalRows();
    if (totalRows == 0) {
        edit.setHandleStatus(TaskHandleStatus.FAILED.getCode());
        edit.setExceptionType("数据为空");
        edit.setErrorMsg("对应导出任务没有数据可导出!");
        edit.setUpdateTime(new Date());
        excelTaskService.updateById(edit);
        return;
    }
    int batchRows = 2000;
    int batchNum = totalRows / batchRows + (totalRows % batchRows == 0 ? 0 : 1);
    int concurrentRound = batchNum / TaskThreadPool.concurrentRate + (batchNum % TaskThreadPool.concurrentRate == 0 ? 0 : 1);
    String fileName = "百万级熊猫数据-" + start + ".csv";
    ExcelWriter excelWriter = EasyExcelFactory.write(fileName, Panda1mExportVO.class)
            .excelType(ExcelTypeEnum.CSV).build();
    WriteSheet writeSheet = EasyExcelFactory.writerSheet(0, "百万熊猫数据").build();
    AtomicInteger cursor = new AtomicInteger(0);
    ThreadPoolTaskExecutor concurrentPool = TaskThreadPool.getConcurrentThreadPool();
    for (int i = 1; i <= concurrentRound; i++) {
        CountDownLatch latch = new CountDownLatch(TaskThreadPool.concurrentRate);
        CopyOnWriteArrayList
data = new CopyOnWriteArrayList<>();
        for (int j = 0; j < TaskThreadPool.concurrentRate; j++) {
            final int startId = cursor.get() * batchRows + 1;
            concurrentPool.submit(() -> {
                List
pandas = baseMapper.selectPandaPage((long) startId, batchRows);
                if (pandas != null && !pandas.isEmpty()) {
                    data.addAll(pandas);
                }
                latch.countDown();
            });
            cursor.incrementAndGet();
        }
        try { latch.await(); } catch (InterruptedException e) {
            edit.setHandleStatus(TaskHandleStatus.FAILED.getCode());
            edit.setExceptionType("导出等待中断");
            edit.setErrorMsg(e.getMessage());
            edit.setUpdateTime(new Date());
            excelTaskService.updateById(edit);
            return;
        }
        excelWriter.write(data, writeSheet);
        data.clear();
    }
    excelWriter.finish();
    edit.setHandleStatus(TaskHandleStatus.SUCCEED.getCode());
    edit.setExcelUrl(fileName);
    edit.setUpdateTime(new Date());
    excelTaskService.updateById(edit);
    log.info("处理报表导出任务结束,编号:{},耗时(ms):{}", taskId, System.currentTimeMillis() - start);
}

The workflow:

Create a task record and return its ID.

Async pool picks up the task, marks it IN_PROGRESS .

Calculate batch size (2 000) and concurrency rounds (3 batches per round).

For each round, submit three parallel queries to the concurrent pool, collect results in a thread‑safe list, then write them to the CSV via EasyExcel streaming.

After all rounds finish, close the writer, update the task to SUCCEED and store the file URL.

2.4 Test Results

After optimization, the API returns instantly (≈16 ms) while the actual export completes in ~11.5 s, with peak heap around 390 MB for a single file and ~930 MB for two concurrent exports. The generated CSV is ~103 MB.

2.5 Crash‑Recovery Mechanism

Because tasks are submitted to a thread pool, a server restart would lose pending jobs. Two recovery strategies are recommended:

Scheduled scanner that re‑submits tasks in WAIT_HANDLE or WAIT_TO_RESTORE state.

Application startup hook (e.g., ApplicationRunner ) that reloads unfinished tasks.

Combining both ensures robustness against crashes and thread‑pool saturation.

Conclusion

The article walks through the entire lifecycle of a million‑row Excel export: from naive implementation, through performance and memory analysis, to a production‑ready asynchronous, batch‑processed, and resource‑controlled solution. The same principles apply to any heavy‑weight data‑processing scenario where latency and resource consumption must be balanced.

Javaperformance optimizationThreadPooleasyexcelExcel Export
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.