Optimizing Java File Compression: From Buffered Streams to NIO Channels, Memory‑Mapped Files and Pipes
This article demonstrates how to dramatically reduce the time required to compress multiple large images in Java by progressively applying buffered streams, NIO channels, direct buffers, memory‑mapped files and pipe techniques, measuring each optimization and explaining the underlying I/O mechanisms.
There is a requirement to receive ten photos from the front‑end, compress them into a zip archive on the back‑end, and stream the result. The initial implementation uses a simple FileInputStream loop without buffering, which takes about 30 seconds for a 20 MB file.
public static void zipFileNoBuffer() {
File zipFile = new File(ZIP_FILE);
try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile))) {
long beginTime = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
try (InputStream input = new FileInputStream(JPG_FILE)) {
zipOut.putNextEntry(new ZipEntry(FILE_NAME + i));
int temp = 0;
while ((temp = input.read()) != -1) {
zipOut.write(temp);
}
}
}
printInfo(beginTime);
} catch (Exception e) {
e.printStackTrace();
}
}Testing with a 2 MB image repeated ten times shows a consumption time of roughly 30 seconds.
fileSize:20M
consum time:29599First Optimization – From 30 seconds to 2 seconds
The main bottleneck is that FileInputStream.read() reads a single byte at a time, invoking a native method for each byte. Using a BufferedInputStream reduces the number of native calls dramatically.
/**
* Reads a byte of data from this input stream. This method blocks
* if no input is yet available.
*
* @return the next byte of data, or -1 if the end of the file is reached.
* @exception IOException if an I/O error occurs.
*/
public native int read() throws IOException;After applying buffering, the code becomes:
public static void zipFileBuffer() {
File zipFile = new File(ZIP_FILE);
try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(zipOut)) {
long beginTime = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
try (BufferedInputStream bufferedInputStream = new BufferedInputStream(new FileInputStream(JPG_FILE))) {
zipOut.putNextEntry(new ZipEntry(FILE_NAME + i));
int temp = 0;
while ((temp = bufferedInputStream.read()) != -1) {
bufferedOutputStream.write(temp);
}
}
}
printInfo(beginTime);
} catch (Exception e) {
e.printStackTrace();
}
}Result:
------Buffer
fileSize:20M
consum time:1808Second Optimization – From 2 seconds to 1 second
Using NIO channels further improves performance. Channels and ByteBuffer align better with OS I/O, allowing direct data transfer without per‑byte copying.
Using Channel
The code creates a WritableByteChannel from the ZipOutputStream and transfers data via FileChannel.transferTo :
public static void zipFileChannel() {
long beginTime = System.currentTimeMillis();
File zipFile = new File(ZIP_FILE);
try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));
WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {
for (int i = 0; i < 10; i++) {
try (FileChannel fileChannel = new FileInputStream(JPG_FILE).getChannel()) {
zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE));
fileChannel.transferTo(0, FILE_SIZE, writableByteChannel);
}
}
printInfo(beginTime);
} catch (Exception e) {
e.printStackTrace();
}
}Output shows a further reduction:
------Channel
fileSize:20M
consum time:1416Kernel Space vs User Space
The article explains why copying between kernel and user space is costly and how transferTo can bypass this copy, moving bytes directly from the filesystem cache to the target channel.
copy阶段就是从内核空间转到用户空间的一个过程
Direct Buffer vs Non‑Direct Buffer
Direct buffers allocate memory outside the JVM heap, allowing the OS to access data without an extra copy, but they are less safe, harder to manage, and rely on garbage collection for reclamation.
Using Memory‑Mapped Files
Memory‑mapped files create a direct buffer that maps a file region into memory, offering similar speed to the channel approach:
//Version 4 使用Map映射文件
public static void zipFileMap() {
long beginTime = System.currentTimeMillis();
File zipFile = new File(ZIP_FILE);
try (ZipOutputStream zipOut = new ZipOutputStream(new FileOutputStream(zipFile));
WritableByteChannel writableByteChannel = Channels.newChannel(zipOut)) {
for (int i = 0; i < 10; i++) {
zipOut.putNextEntry(new ZipEntry(i + SUFFIX_FILE));
MappedByteBuffer mappedByteBuffer = new RandomAccessFile(JPG_FILE_PATH, "r").getChannel()
.map(FileChannel.MapMode.READ_ONLY, 0, FILE_SIZE);
writableByteChannel.write(mappedByteBuffer);
}
printInfo(beginTime);
} catch (Exception e) {
e.printStackTrace();
}
}Result:
---------Map
fileSize:20M
consum time:1305Using Pipe
Java NIO pipes provide a one‑way data connection between two threads. The example shows an asynchronous task writing to a pipe while the main thread reads from it and writes to the zip output.
//Version 5 使用Pipe
public static void zipFilePip() {
long beginTime = System.currentTimeMillis();
try (WritableByteChannel out = Channels.newChannel(new FileOutputStream(ZIP_FILE))) {
Pipe pipe = Pipe.open();
CompletableFuture.runAsync(() -> runTask(pipe));
ReadableByteChannel readableByteChannel = pipe.source();
ByteBuffer buffer = ByteBuffer.allocate((int) FILE_SIZE * 10);
while (readableByteChannel.read(buffer) >= 0) {
buffer.flip();
out.write(buffer);
buffer.clear();
}
} catch (Exception e) {
e.printStackTrace();
}
printInfo(beginTime);
}
public static void runTask(Pipe pipe) {
try (ZipOutputStream zos = new ZipOutputStream(Channels.newOutputStream(pipe.sink()));
WritableByteChannel out = Channels.newChannel(zos)) {
System.out.println("Begin");
for (int i = 0; i < 10; i++) {
zos.putNextEntry(new ZipEntry(i + SUFFIX_FILE));
FileChannel jpgChannel = new FileInputStream(new File(JPG_FILE_PATH)).getChannel();
jpgChannel.transferTo(0, FILE_SIZE, out);
jpgChannel.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}Conclusion
Even a simple optimization can lead to deep learning of various concepts. Understanding why each technique works—buffered streams, NIO channels, direct buffers, memory‑mapped files, and pipes—helps you apply the knowledge effectively and retain it.
Practice what you learn to achieve lasting mastery.
Java Captain
Focused on Java technologies: SSM, the Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading; occasionally covers DevOps tools like Jenkins, Nexus, Docker, ELK; shares practical tech insights and is dedicated to full‑stack Java development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.