Why Kafka Is So Fast: Sequential Writes, Memory‑Mapped Files, and Zero‑Copy
This article explains how Kafka achieves high throughput by using sequential disk writes, memory‑mapped files, batch compression, and zero‑copy sendfile for reads, while also covering data retention policies and the role of offsets in consumer processing.
Kafka stores messages on disk; although disk I/O is slower than memory, Kafka achieves high throughput.
Even on ordinary servers, Kafka can handle millions of writes per second, surpassing most middleware, making it popular for log processing and massive data scenarios.
Benchmark reference: Apache Kafka benchmark – 2 million writes per second on three cheap machines ( link ).
The article analyzes why Kafka is fast, covering both data write and read paths.
Write Path
Kafka writes all received messages to disk, guaranteeing durability. It optimizes write speed using two techniques: sequential writes and memory‑mapped files (MMFile).
Sequential Write
Disk performance depends on access pattern; sequential I/O can approach memory speed, while random I/O is slow due to mechanical seeking.
In sequential I/O scenarios, certain optimizations allow disk read/write speeds to be comparable to memory.
Note: details are omitted; see http://searene.me/2017/07/09/Why-is-Kafka-so-fast/
Hard disks dislike random I/O and favor sequential I/O. Linux also provides read‑ahead, write‑behind, and caching optimizations.
Sequential disk I/O can be faster than random memory I/O.
JVM garbage‑collection overhead is high; using disk avoids large heap pressure.
Disk cache remains usable after a cold start.
Each partition is stored as a file; new messages are appended to the file end. Kafka does not delete data; instead, each consumer tracks its position with an offset stored in Zookeeper.
Data retention is managed by two policies: time‑based and size‑based, configurable in Kafka's settings.
Memory Mapped Files
Even with sequential writes, disk is slower than memory. Kafka uses mmap to map files into virtual memory, allowing the OS to flush data to disk asynchronously.
Memory‑mapped files let a process read/write as if it were memory, avoiding copies between user and kernel space and providing significant I/O gains.
However, data written to mmap is not guaranteed to be on disk until the OS flushes it.
Kafka’s producer.type controls flushing: synchronous producers flush after mmap, asynchronous producers do not.
Read Path
Kafka uses sendfile for zero‑copy transmission, eliminating multiple copies between kernel and user buffers.
Zero‑Copy with sendfile
Traditional read/write involves four copies: disk → kernel buffer → user buffer → socket buffer → protocol engine.
The sendfile system call copies data directly from the kernel file cache to the socket buffer, reducing copies and context switches.
sendfile(socket, file, len);Since kernel 2.1, sendfile reduced the number of copies; later kernels further simplified the path.
Web servers such as Apache, Nginx, and Lighttpd use sendfile to boost file‑transfer performance.
Kafka combines mmap and sendfile to deliver messages efficiently to consumers.
Batch Compression
Network I/O is often the bottleneck; Kafka compresses batches of messages rather than individual ones, supporting Gzip and Snappy compression protocols.
Conclusion
Kafka’s speed stems from treating all messages as a single append‑only log, using sequential writes, memory‑mapped files, batch compression, and zero‑copy sendfile for reads, while retaining data and managing offsets via Zookeeper.
Author: Binyue Original article: cnblogs.com/binyue/p/10308754.html
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.