Mastering TCP Handshakes, Queues, and Linux Tuning for High‑Performance Servers
This article explains TCP’s connection establishment and termination processes, details the roles of half‑ and full‑connection queues in Linux, demonstrates how to simulate and detect queue overflows using tools like netstat, ss, hping3 and ab, and provides kernel tuning parameters to mitigate SYN flood and TIME_WAIT issues.
TCP Introduction
TCP is a connection‑oriented unicast protocol; before data transmission, both client and server must establish a connection, which stores peer information such as IP address and port.
TCP can be viewed as a byte stream that handles packet loss, duplication, and errors at the IP layer or below. Connection parameters are exchanged and placed in the TCP header.
TCP provides a reliable, connection‑oriented, byte‑stream transport service, using a three‑way handshake to establish a connection and a four‑way handshake to close it.
TCP Three‑Way Handshake
The handshake ensures both sides can confirm their send and receive capabilities.
First handshake: client sends a SYN packet; the server receives it, confirming the client’s sending ability and the server’s receiving ability.
Second handshake: server replies with SYN‑ACK; the client receives it, confirming the server’s send/receive ability and the client’s own capabilities.
Third handshake: client sends ACK; the server receives it, confirming the client’s receive ability and the server’s send ability.
After these steps, both sides know their communication capabilities and can exchange data normally.
TCP Handshake State Queues
During the handshake, the Linux kernel maintains two queues:
SYN (half‑connection) queue
Accept (full‑connection) queue
When the server receives a SYN request, the connection is placed in the SYN queue; after the third ACK, it moves to the accept queue awaiting the application’s
accept()call.
Both queues have maximum lengths; exceeding them causes the kernel to drop packets or send RST.
Half‑Connection Queue
Although there is no direct command to view the half‑connection queue size, connections in the
SYN_RECVstate belong to it. The following command counts them:
<code>$ netstat | grep SYN_RECV | wc -l
1723</code>When the SYN queue overflows, new connections cannot be established.
Simulating SYN‑Queue Overflow
Continuously sending SYN packets without completing the third ACK creates many
SYN_RECVconnections, simulating a SYN flood or DDoS attack.
Experiment environment:
Client and server: CentOS 7.9, kernel 3.10.0‑1160.15.2.el7.x86_64
Server IP 172.16.0.20, client IP 172.16.0.157
Server runs Nginx on port 80
Use
hping3to generate the SYN flood:
<code>$ hping3 -S -p 80 --flood 172.16.0.20
HPING 172.16.0.20 (eth0 172.16.0.20): S set, 40 headers + 0 data bytes
hping in flood mode, no replies will be shown</code>Failure counts caused by a full SYN queue can be observed with:
<code>$ netstat -s | grep "SYNs to LISTEN"
1541918 SYNs to LISTEN sockets dropped</code>Increase the SYN backlog with:
<code>sysctl -w net.ipv4.tcp_max_syn_backlog = 1024</code>If the SYN queue is full, enabling
syncookiesallows the server to establish connections without using the SYN queue. Enable it with:
<code>sysctl -w net.ipv4.tcp_syncookies = 1</code>Full‑Connection Queue
Use
ssto view the full‑connection queue. In LISTEN state,
Recv‑Qshows the current accept queue size, and
Send‑Qshows the maximum backlog.
<code>$ ss -ltnp
LISTEN 0 1024 *:8081 users:("java",pid=5686,fd=310)</code>In non‑LISTEN state,
Recv‑Qand
Send‑Qrepresent unread bytes and unacknowledged bytes, respectively.
Simulating Full‑Connection Queue Overflow
Same hardware as above. Use ApacheBench (
ab) to generate 10 000 concurrent connections with 100 000 requests:
<code>$ ab -c 10000 -n 100000 http://172.16.0.20:80/
...
Failed requests: 167336 (Length: 84384, Exceptions: 82952)
...</code>After the test, the full‑connection queue grew to 512, exceeding the default maximum, which can be verified with:
<code>$ ss -tulnp | grep 80
tcp LISTEN 411 511 *:80 *:*
... (later) ...
tcp LISTEN 512 511 *:80 *:*</code>Overflow counts are shown by:
<code>$ netstat -s | grep overflowed
1233972 times the listen queue of a socket overflowed</code>Increasing the Full‑Connection Queue
The maximum size is
min(somaxconn, backlog). Increase both parameters:
<code>sysctl -w net.core.somaxconn=65535
# In nginx.conf
listen 80 backlog=65535;</code>After restarting Nginx, the queue size becomes 65535:
<code>$ ss -tulnp | grep 80
tcp LISTEN 0 65535 *:80 *:*</code>Further load testing shows no overflow, confirming the tuning.
TCP Four‑Way Termination
First FIN: active closer sends FIN, closing its sending direction.
Second ACK: passive side acknowledges the FIN.
Third FIN: passive side sends its own FIN.
Fourth ACK: active side acknowledges the second FIN, completing the termination.
Servers often initiate the close because HTTP is request‑response; after sending the response, the server closes the connection to free resources.
Termination State Diagram
The process involves only FIN and ACK packets. FIN indicates no more data will be sent; ACK confirms receipt.
Active side sends FIN → state FIN_WAIT1.
Passive side receives FIN, replies ACK → state CLOSE_WAIT.
Active side receives ACK → state FIN_WAIT2.
Passive side calls
close(), sends FIN → state LAST_ACK.
Active side receives FIN, replies ACK → state TIME_WAIT (2 MSL).
Passive side receives final ACK → connection closed.
Optimizing TIME_WAIT on the Active Side
Large numbers of TIME_WAIT sockets consume memory and ports. Kernel options to mitigate this include:
Increase
net.ipv4.tcp_max_tw_bucketsand
net.netfilter.nf_conntrack_max.
<code>sysctl -w net.ipv4.tcp_max_tw_buckets=1048576
sysctl -w net.netfilter.nf_conntrack_max=1048576</code>Reduce
net.ipv4.tcp_fin_timeoutand
net.netfilter.nf_conntrack_tcp_timeout_time_waitto free resources faster.
<code>sysctl -w net.ipv4.tcp_fin_timeout=15
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30</code>Enable port reuse with
net.ipv4.tcp_tw_reuse=1.
<code>sysctl -w net.ipv4.tcp_tw_reuse=1</code>Expand the local port range via
net.ipv4.ip_local_port_range.
<code>sysctl -w net.ipv4.ip_local_port_range="1024 65535"</code>Raise the maximum number of file descriptors with
fs.nr_openand
fs.file-max, or set
LimitNOFILEin systemd unit files.
<code>sysctl -w fs.nr_open=1048576
sysctl -w fs.file-max=1048576</code>These adjustments help maintain high‑performance TCP services under heavy load.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.