Backend Development 15 min read

Deep Dive into epoll: Principles, Blocking, and I/O Multiplexing

This article provides an in‑depth exploration of Linux’s epoll mechanism, covering its blocking behavior, kernel‑level processing, NAPI optimization, comparisons with select/poll, and practical insights into I/O multiplexing, helping backend engineers understand performance characteristics and design efficient network services.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Deep Dive into epoll: Principles, Blocking, and I/O Multiplexing

1 Introduction

Epoll is an old but essential topic for backend engineers; many people study it, leading to varied understandings and misconceptions.

This article revisits epoll, focusing on thread blocking principles, interrupt optimization, NIC data handling, and the underlying mechanisms, and also critiques popular viewpoints.

2 Motivation

Before the main content, the author asks several questions about epoll performance, blocking vs non‑blocking, synchronous vs asynchronous I/O, and why select, poll, and epoll exist.

3 Getting Started with epoll

Epoll is Linux kernel’s scalable I/O event notification mechanism with superior performance. A benchmark from libevent compares select, poll, epoll, and kqueue, showing epoll’s response time remains stable as the number of sockets grows, unlike select and poll.

The benchmark limits active connections to 100; epoll excels when many sockets are idle, but with many active sockets the advantage diminishes.

4 The Principles Behind epoll

4.1 Blocking

4.1.1 Why Blocking

Using a NIC as example, the data‑receiving process consists of four steps: DMA write to memory, IRQ generation, kernel interrupt handling, and user‑space processing.

Because the kernel must wait for data (milliseconds) while user processing is nanoseconds, the process blocks, freeing CPU.

4.1.2 Blocking Does Not Consume CPU

Linux defines several process states; only runnable processes use CPU. Blocked processes are idle, and runnable processes are placed in a work queue.

4.1.3 Unblocking

The kernel identifies the socket’s owning process via the socket’s PID and port, changes its state to runnable, and the scheduler later runs it.

4.1.4 Process Model

This is essentially blocking I/O (BIO) where each process handles its own socket.

4.2 Optimizing Context Switches

Two sources of frequent context switches are NIC IRQ handling and per‑socket process wake‑ups.

4.2.1 NIC NAPI Mechanism

NAPI splits the interrupt handler into a fast IRQ part (napi_schedule) and a soft‑irq part (net_rx_action) that processes packets in batches.

The simplified flow: DMA write → IRQ → napi_schedule → soft‑irq → batch packet processing → user‑space.

4.2.2 Single‑Thread I/O Multiplexing

Kernel I/O multiplexing reduces per‑socket context switches by using a single thread to handle many sockets, similar to NAPI.

Select’s implementation uses an fd_set limited to 1024 descriptors; epoll uses an instance handle backed by a red‑black tree and a ready‑list for O(1) operations.

4.3 Evolution of I/O Multiplexing APIs

Comparison of select and epoll: select performs O(n) scans of fd_set, while epoll maintains O(1) ready lists via a kernel‑managed data structure.

Code snippets (shown as images) illustrate typical select and epoll usage.

4.4 Summary

Blocking improves CPU utilization by letting the kernel wait for data.

I/O multiplexing (and NAPI) reduces context switches.

Three API generations (select, poll, epoll) evolve to improve kernel‑process interaction.

5 Diss Section

The author presents personal interpretations, arguing that all Linux I/O models are fundamentally synchronous and that “blocking vs non‑blocking” or “sync vs async” classifications can be misleading.

5.1 Classification of I/O Models

Proposes two categories: the programmer‑oriented process model and the OS‑oriented I/O multiplexing model, with Reactor (Java NIO) and Proactor (Java AIO) as user‑space dispatch patterns.

5.2 About mmap

Clarifies that epoll does not use mmap; a strace of a demo confirms this.

END

backend developmentI/O multiplexingLinux kernelNetwork ProgrammingepollNAPIselect vs epoll
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.