Databases 8 min read

Investigation of MySQL Connection Timeouts under High Concurrency: SYN Cookie and Accept Queue Analysis

The article analyzes why Sysbench‑driven MySQL connections time out when concurrency exceeds 5 000, tracing the issue to TCP SYN‑cookie processing and a full accept queue, and proposes configuration‑level mitigations.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Investigation of MySQL Connection Timeouts under High Concurrency: SYN Cookie and Accept Queue Analysis

Phenomenon

When Sysbench is used to stress‑test MySQL, connections time out if the concurrency level is larger than 5 000.

Hypothesis 1

Initially it was assumed that each Sysbench connection consumes a thread, exhausting resources and causing the timeout. Increasing the Sysbench timeout in the source code did not eliminate the problem.

Environment Check

Standard checks were performed:

MySQL error log showed no anomalies.

System log showed no anomalies.

tcpdump captured normal three‑way handshakes; some SYN packets were retransmitted while others were not.

A custom concurrent connection generator reproduced the issue, ruling out Sysbench itself.

Hypothesis 2

It was suspected that MySQL might not be receiving the handshake packet at the application layer.

No abnormality was found in the MySQL stack; it seemed the server did not see new connections.

strace on MySQL showed that the accept() call did not detect the new connection.

A reference article about “TCP ‘stuck’ connection mystery” was consulted.

Analysis

The referenced phenomenon matches the current situation. The normal TCP three‑way handshake is:

Client sends SYN.

Server reserves resources and replies with SYN‑ACK.

Client replies with ACK.

Server receives ACK and the connection is established.

Application data exchange begins.

When SYN‑cookies are used (e.g., under SYN‑flood protection), the handshake changes to:

Client sends SYN.

Server does not reserve resources, replies with SYN‑ACK containing a signed cookie.

Client sends ACK with the cookie response.

Server validates the cookie, allocates resources, and the connection is established.

Application data exchange begins.

If the ACK packet (step 3) is lost, the client believes the connection is established, while the server sees no connection at all. This leads to two possible outcomes:

If the first application packet should be sent from the client, it will be retransmitted or a connection error will be raised.

If the first packet should be sent from the server (as in MySQL), the server never sends it, resulting in the observed failure.

Why the Third‑Step ACK Packet Is Lost

The reference explains that packet loss can be caused by buffer overflows somewhere in the network stack.

Using SystemTap we investigated the kernel function responsible for SYN‑cookie verification:

probe kernel.function("cookie_v4_check").return {
    source_port = @cast($skb->head + $skb->transport_header, "struct tcphdr")->source
    printf("source=%d, return=%d\n", readable_port(source_port), $return)
}

function readable_port(port) {
    return (port & ((1<<9)-1)) << 8 | (port >> 8)
}

The probe showed that cookie_v4_check returned NULL (0), meaning the SYN‑cookie verification failed.

Further inspection revealed that the failure was due to a full accept queue:

static inline bool sk_acceptq_is_full(const struct sock *sk)
{
    return sk->sk_ack_backlog > sk->sk_max_ack_backlog;
}

Correlation Between the Fault and Logs

Initially the system log appeared normal. After the analysis, it was discovered that the kernel does emit a warning when the accept queue is full, but the message is suppressed after the first occurrence, breaking the direct correlation with the fault.

Examining the Linux source shows that each listening socket logs the warning only once, so to capture the log reliably the MySQL service must be restarted for each test.

Solution

The fault is hard to detect once it occurs because the log appears only once and disappears after a MySQL restart. Potential mitigations are:

Modify the MySQL protocol so the client initiates the handshake first (impractical).

Disable SYN‑cookies (reduces security against SYN‑flood attacks).

Increase the SYN‑cookie trigger threshold (i.e., enlarge the SYN backlog) to tolerate normal traffic spikes.

Several kernel parameters influence the SYN backlog length; see this article for details.

TCPMySQLLinux kernelConnection TimeoutsysbenchSYN cookie
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.