Backend Development 18 min read

Optimizing Large-Scale Excel Import/Export with Apache POI to Avoid OOM and Reduce GC

This article explains how to prevent frequent full GC and out‑of‑memory errors when using Apache POI for massive Excel imports and exports by leveraging SXSSFWorkbook for XLSX, choosing appropriate processing models for XLS/XLSX, and provides performance test results and code samples.

Top Architect
Top Architect
Top Architect
Optimizing Large-Scale Excel Import/Export with Apache POI to Avoid OOM and Reduce GC

When exporting a huge amount of data to Excel using POI, the default XSSFWorkbook can cause frequent full garbage collections (FGC) and even OutOfMemoryError because all rows, cells, and styles stay in memory until the workbook is finished.

Excel files have two storage formats: the old binary BIFF8 format for XLS and the OOXML (ZIP‑packed XML) format for XLSX. XLSX files are essentially a collection of XML parts that can be inspected by renaming the file to .zip and extracting it.

Export optimization (XLSX)

Since POI 3.8, SXSSFWorkbook provides a streaming API that keeps only a configurable window of rows (default 100) in memory; older rows are flushed to temporary files, dramatically reducing memory usage.

Workbook workbook = new SXSSFWorkbook(new XSSFWorkbook(inputStream));

The implementation creates a temporary file for each flushed window and writes rows directly to the underlying XML, releasing memory as soon as the window is exceeded.

Import optimization

POI offers three reading models:

User Model – a high‑level DOM‑like API that loads the entire workbook into memory, suitable for small files but prone to OOM on large datasets.

Event Model – a SAX‑style parser that reads XML elements sequentially without creating POI objects, offering the lowest memory footprint.

User Event Model – a wrapper around the Event Model that provides convenient callbacks such as startRow , endRow , and cell , while still streaming the data.

Example of an Event Model handler for XLSX:

@Override
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
    if (name.equals("c")) {
        System.out.print(attributes.getValue("r") + " - ");
        String cellType = attributes.getValue("t");
        nextIsString = cellType != null && cellType.equals("s");
        inlineStr = cellType != null && cellType.equals("inlineStr");
    }
    lastContents = "";
}

@Override
public void endElement(String uri, String localName, String name) throws SAXException {
    if (nextIsString) {
        Integer idx = Integer.valueOf(lastContents);
        lastContents = lruCache.get(idx);
        if (lastContents == null && !lruCache.containsKey(idx)) {
            lastContents = new XSSFRichTextString(sst.getEntryAt(idx)).toString();
            lruCache.put(idx, lastContents);
        }
        nextIsString = false;
    }
    if (name.equals("v") || (inlineStr && name.equals("c"))) {
        System.out.println(lastContents);
    }
}

Performance tests show that using SXSSFWorkbook for exporting 300,000 rows × 10 columns keeps memory usage stable, while the User Model for importing the same size data quickly exhausts heap space.

Conclusion

Apache POI provides streaming write support for XLSX and SAX‑based read support for both XLS and XLSX. Selecting the appropriate model based on data volume prevents OOM and reduces GC overhead. For even higher performance, consider alternatives such as Alibaba's EasyExcel.

References

https://www.jianshu.com/p/6d6772f339cb

https://poi.apache.org/components/spreadsheet/how-to.html

JavaperformancememoryExcellarge-dataApache POI
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.