Understanding Linux Network I/O: OSI Layers, MTU, Fragmentation, and TCP Flow Control
This article explains the structure of Linux network I/O by detailing the OSI seven‑layer model, the role of each layer, MTU/PMTU concepts, IP fragmentation and reassembly, and key TCP mechanisms such as MSS, flow control, and congestion control, providing a comprehensive foundation for studying zero‑copy networking.
In the previous article we discussed the structure of Linux network I/O; this piece clarifies why the network stack is so layered and explains terminology such as MSS and IFG.
Linux network I/O is built on the OSI seven‑layer protocol suite, with the kernel implementing everything from the physical layer up to the transport layer. Understanding the OSI model is essential to grasp Linux networking.
1. Physical Layer
Data at the physical layer is represented as signals on various media (copper wire, fiber, air, vacuum). These signals are abstracted to binary 0/1, forming the basis for the link layer.
2. Data Link Layer
2.1 Overview
Because signals are prone to interference, frames are used to encapsulate bits, and frame checksums ensure integrity.
Using Ethernet as an example, a frame consists of a preamble (7 bytes) for clock synchronization, a start‑of‑frame delimiter (1 byte), the data payload, and an inter‑frame gap (IFG, 12 bytes) between frames.
Data frames contain three parts: header, payload, and trailer.
Header (18 bytes): destination MAC (6 B), source MAC (6 B), 802.1Q tag (4 B), EtherType (2 B).
Payload (46–1500 bytes).
Trailer (4 bytes) for CRC.
2.2 Abstraction
After the link layer abstracts the signal, the data becomes a frame . The portion of the frame that the network layer operates on is the payload , referred to as a datagram .
2.3 MTU/PMTU
MTU (Maximum Transmission Unit) is the largest datagram size a link can carry; exceeding it causes the packet to be dropped. Common Ethernet MTU is 1500 bytes.
PMTU (Path MTU) is the smallest MTU along a communication path and may differ in each direction.
2.4 Testing Your PMTU
When using ping , packets larger than 1472 bytes (ICMP payload) fail because the total size (1472 + 8 ICMP + 20 IP) exceeds the Ethernet MTU of 1500 bytes.
3. Network Layer (IPv4 Example)
3.1 Overview
The Internet Protocol (IP) routes packets based on source and destination addresses, providing an unreliable, best‑effort delivery service.
3.2 IPv4 Header
An IPv4 datagram has a variable‑length header, typically 20 bytes.
3.3 Fragmentation
IP fragments packets that exceed the MTU of the underlying link. Each fragment is ≤ MTU − IP‑header size. Fragments may be further fragmented on subsequent hops.
3.4 Reassembly
The receiver collects fragments (when the DF flag is 0) and reassembles them in order before passing the complete datagram to the upper layer.
3.5 Problems Caused by IP Fragmentation
CPU and memory overhead on both ends.
Loss of a single fragment forces retransmission of the entire original packet.
Maliciously crafted fragments can exhaust receiver memory.
Firewalls cannot easily filter non‑first fragments because they lack transport‑layer headers.
3.6 Abstraction
At the IP layer, data is abstracted as a datagram , while each fragment is called a fragment . The transport layer receives a logical, complete packet (or TCP segment) after reassembly.
4. Transport Layer (TCP Example)
4.1 Overview
TCP is a connection‑oriented, reliable, byte‑stream protocol.
4.2 Transmission Process
Application sends a data stream to TCP.
TCP segments the stream into packets; IP forwards them.
Each packet gets a sequence number; the receiver acknowledges with ACKs.
If an ACK is not received within the RTT, the sender retransmits.
Checksums detect corrupted packets, which are also retransmitted.
Because IP does not guarantee order, TCP reorders packets using sequence numbers.
4.3 MSS
Although IP imposes no size limit, TCP negotiates a Maximum Segment Size (MSS) based on the path MTU. MSS is the largest amount of application data that can be carried in a TCP segment (excluding TCP header and options).
During the three‑way handshake, each side advertises its MSS; the smaller value is used. Typical Ethernet result: MSS = 1500 − 20 (IP) − 20 (TCP) = 1460 bytes.
4.4 Flow Control
TCP uses a sliding‑window mechanism. The receiver advertises a receive window size, limiting how many bytes the sender may transmit without further ACKs.
4.5 Congestion Control
Early TCP lacked congestion windows. Modern TCP adds a congestion window and employs algorithms such as slow start, increasing the sending rate until loss or ACK feedback indicates the network limit.
4.6 Abstraction
TCP abstracts data as packets (or TCP segments). To the application, TCP provides a stream abstraction, and the OS exposes the connection as a socket API.
End
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.