Backend Development 12 min read

Optimizing Large‑Scale Excel Import in Java: From POI to EasyExcel with Caching, Batch Inserts, and Parallel Streams

This article describes how to dramatically speed up the import of massive Excel files in a Java backend by replacing raw POI with EasyExcel, caching database lookups, using MySQL batch inserts, and employing parallel streams to achieve sub‑minute processing for hundreds of thousands of rows.

Code Ape Tech Column

Mar 1, 2021

Optimizing Large‑Scale Excel Import in Java: From POI to EasyExcel with Caching, Batch Inserts, and Parallel Streams

The project required importing payment records from an Excel template into the system, converting the data into arrears, bills, and bill details, and persisting them to a MySQL 5.7 database. The original implementation handled only a few hundred rows, but version 4.0 needed to process over 100,000 rows and insert more than 300,000 database records.

To meet this demand, the author iteratively refactored the import code, documenting each performance bottleneck and its solution.

First Version: POI + Row‑by‑Row Validation + Row‑by‑Row Insert

The initial version used raw Apache POI to read each row into a List<ArrayList>, then performed per‑row field validation, database lookups for duplicate checks, and inserted each row individually. This approach caused massive network I/O and CPU overhead.

Second Version: EasyPOI + Cached Database Queries + Batch Insert

Three optimizations were applied:

Cache frequently queried data (e.g., house information) in a HashMap to replace per‑row database lookups.

Introduce a custom SessionMapper that uses MapResultHandler to load the cache in a single query.

Use MySQL INSERT … VALUES batch syntax via MyBatis <foreach> to insert many rows at once.

@Repository
public class SessionMapper extends SqlSessionDaoSupport {
    @Resource
    public void setSqlSessionFactory(SqlSessionFactory sqlSessionFactory) {
        super.setSqlSessionFactory(sqlSessionFactory);
    }
    // 区域楼宇单元房号 - 房屋ID
    @SuppressWarnings("unchecked")
    public Map<String, Long> getHouseMapByAreaId(Long areaId) {
        MapResultHandler handler = new MapResultHandler();
        this.getSqlSession().select(BaseUnitMapper.class.getName()+".getHouseMapByAreaId", areaId, handler);
        Map<String, Long> map = handler.getMappedResults();
        return map;
    }
}

public class MapResultHandler implements ResultHandler {
    private final Map mappedResults = new HashMap();
    @Override
    public void handleResult(ResultContext context) {
        @SuppressWarnings("rawtypes")
        Map map = (Map)context.getResultObject();
        mappedResults.put(map.get("key"), map.get("value"));
    }
    public Map getMappedResults() {
        return mappedResults;
    }
}

Batch insert example:

<insert id="insertList">
    insert into table(col1, col2)
    values
    <foreach collection="list" item="item" separator=",">
        (#{item.col1}, #{item.col2})
    </foreach>
</insert>

Third Version: EasyExcel + Cached Queries + Batch Insert

Switching to Alibaba's EasyExcel reduced memory consumption and read time dramatically (41 w rows, 25 columns, 45.5 MB read in ~50 s). The same caching and batch‑insert strategies were retained.

Fourth Version: Optimizing Insert Batch Size and Parallelism

Experiments showed that inserting 1,000 rows per batch yielded the best throughput, likely because larger batches exceeded the InnoDB buffer pool and caused disk swapping. Parallel streams were then used to insert batches concurrently, achieving 72 s for 100 k rows and 42 w related records.

/**
 * Function: Quickly insert data using parallel streams
 * @author Keats
 * @date 2020/7/1 9:25
 */
public class InsertConsumer {
    private static final int SIZE = 1000;
    static {
        System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "4");
    }
    public static <T> void insertData(List<T> list, Consumer<List<T>> consumer) {
        if (list == null || list.size() < 1) return;
        List<List<T>> streamList = new ArrayList<>();
        for (int i = 0; i < list.size(); i += SIZE) {
            int j = Math.min(i + SIZE, list.size());
            streamList.add(list.subList(i, j));
        }
        streamList.parallelStream().forEach(consumer);
    }
}

Usage example:

InsertConsumer.insertData(feeList, arrearageMapper::insertList);

Other Performance Tips

Heavy logging inside loops (e.g., printing INFO for every 1,000 rows) can increase processing time by an order of magnitude. Disabling such logs reduced the validation phase to roughly 10 % of its original duration.

Conclusion

Use a fast Excel library such as EasyExcel.

Cache data needed for database validation to trade space for time.

Batch insert rows with INSERT … VALUES syntax.

Leverage multithreading (parallel streams) to hide I/O latency.

Avoid unnecessary logging inside tight loops.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance Optimization MyBatis Batch Insert easyexcel Excel Import parallel processing

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.