Backend Development 41 min read

Understanding epoll: Linux I/O Multiplexing, Design, and Practical Usage

This article explains the limitations of traditional I/O models, introduces epoll as a high‑performance Linux I/O multiplexing mechanism, details its design principles, API usage, kernel data structures, and provides practical coding examples and optimization tips for building scalable backend services.

Deepin Linux

Apr 25, 2025

Understanding epoll: Linux I/O Multiplexing, Design, and Practical Usage

In modern high‑concurrency scenarios such as e‑commerce flash sales or online gaming servers, efficient I/O processing is critical; traditional models like blocking I/O, non‑blocking I/O, and select/poll become bottlenecks due to thread waste, CPU overhead, and file‑descriptor limits.

epoll, introduced in Linux 2.5.44, is an enhanced version of poll designed to handle massive numbers of file descriptors with high CPU utilization, using a kernel‑user shared memory event table and ready‑queue to avoid full scans.

Key differences from select/poll: select is limited to ~1024 descriptors and copies all descriptors between user and kernel space each call; poll removes the hard limit but still copies arrays and scans linearly. epoll removes the descriptor limit, stores events in a red‑black tree for O(log N) operations, and maintains a ready list for O(1) retrieval.

Design concepts: epoll separates queue maintenance (epoll_ctl) from blocking (epoll_wait), uses a ready list to avoid unnecessary scans, and employs a red‑black tree for fast insertion, deletion, and lookup of monitored sockets.

Operation modes: Level‑Triggered (LT) notifies as long as a descriptor remains ready; Edge‑Triggered (ET) notifies only on state changes, requiring the application to drain the socket until EAGAIN/EWOULDBLOCK.

epoll API:

#include <sys/epoll.h>
int epoll_create(int size);
int epoll_create1(int flags);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

The epoll_event structure contains an events mask (e.g., EPOLLIN, EPOLLOUT, EPOLLERR) and a data union for user‑defined identifiers.

Typical usage steps:

Create an epoll instance with epoll_create1(0).

Set the listening socket to non‑blocking mode and add it to the epoll set with epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd, &event), listening for EPOLLIN.

Enter an event loop calling epoll_wait to retrieve ready events.

When the listening socket is ready, accept new connections, set them non‑blocking, and add them to epoll.

When a client socket is ready, read data in a loop until read returns -1 with errno EAGAIN/EWOULDBLOCK (especially in ET mode), then optionally write a response.

On read returning 0 or fatal errors, close the socket and remove it from epoll with epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL).

The kernel implements epoll using an eventpoll structure that contains a spinlock, mutex, wait queues, a red‑black tree ( rbr) for the monitored set, and a double‑linked list ( rdllist) for ready events. When a socket becomes ready, the kernel adds a reference to the ready list and wakes any processes blocked in epoll_wait.

Data structures:

Red‑black tree: provides O(log N) insert, delete, and lookup for registered file descriptors, ensuring scalability as the number of connections grows.

Double‑linked list: stores only the ready descriptors, allowing O(1) insertion/removal and easy iteration when epoll_wait copies events to user space.

Practical example (full TCP echo server) demonstrates creating a listening socket, configuring non‑blocking mode, registering with epoll, handling new connections, reading/writing data, and cleaning up resources. The code uses setnonblocking, epoll_ctl, and a loop around epoll_wait to serve multiple clients concurrently with a single thread.

Common pitfalls and optimizations:

In ET mode, always read until EAGAIN/EWOULDBLOCK to avoid missing data.

Use EPOLLEXCLUSIVE (Linux 4.5+) or SO_REUSEPORT to prevent the “epoll thundering herd” problem when multiple processes listen on the same port.

Choose appropriate epoll_wait timeout values to balance latency and CPU usage.

Batch process multiple ready events per epoll_wait call to reduce system‑call overhead.

Consider EPOLLONESHOT for thread‑safe handling of a socket by a single worker at a time.

By leveraging epoll’s event‑driven design, backend services can efficiently manage thousands to millions of concurrent connections with minimal CPU and memory overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

high concurrency I/O Multiplexing network programming Event-driven epoll

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.