Databases 27 min read

Percona Thread Pool Implementation and TXSQL Optimizations: Architecture, Dynamic Switching, Load Balancing, and Disconnection Handling

This article explains the background, principles, architecture, and detailed implementation of Percona's thread pool, and describes TXSQL's dynamic thread‑pool activation, load‑balancing strategies, disconnection optimizations, related configuration parameters, status variables, and a concise summary.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
Percona Thread Pool Implementation and TXSQL Optimizations: Architecture, Dynamic Switching, Load Balancing, and Disconnection Handling

Part 1: Background

Community editions of MySQL use a one‑thread‑per‑connection (Per_thread) model, creating a worker thread for each connection. As connections increase, resource contention grows and response time rises, as shown in the response‑time graph.

For overall database throughput, resources are not exhausted until a certain connection count; beyond that threshold, throughput drops due to increased competition, as illustrated in the second graph.

To avoid throughput degradation when connections surge, MariaDB and Percona recommend using a thread pool. The thread‑pool concept is likened to limiting the number of cars on a bridge during rush hour, thereby keeping the bridge’s throughput high. In databases, limiting concurrent threads reduces context‑switching and lock contention, improving OLTP workloads. With a thread pool, throughput remains high even as connections increase.

Part 2: Percona Thread‑Pool Implementation

The thread pool works by pre‑creating a set of worker threads. A listener thread monitors new connection requests and assigns a worker from the pool. After serving a request, a worker remains in the pool for the next request instead of being destroyed.

2.1 Thread‑Pool Architecture

The pool consists of multiple thread groups and a timer thread. The number of thread groups (the concurrency limit) is usually set to the number of CPU cores. A timer thread periodically checks each group for blockage and wakes or creates workers as needed.

Each thread group contains several worker threads, optionally one listener thread, high‑ and low‑priority event queues, a mutex, epoll file descriptor, and statistics.

2.2 New‑Connection Creation and Assignment

When a new connection arrives, the pool assigns it to a thread group based on thread_id() % group_count . This simple modulo may cause load imbalance if many busy connections map to the same group, which is a point for optimization.

The connection is placed into the low‑priority queue, waiting until workers have processed all high‑priority events.

2.3 Listener Thread

The listener thread (one per group) uses epoll to monitor connection requests. Events are placed into the high‑priority queue if any of the following conditions hold:

The pool operates in high‑priority‑statement mode (all events go to the high‑priority queue).

The pool operates in high‑priority‑transaction mode and the connection has not exceeded threadpool_high_prio_tickets submissions.

The connection holds a table lock.

The connection holds an MDL lock.

The connection holds a global read lock.

The connection holds a backup lock.

High‑priority events are processed first by workers. Low‑priority events are only processed when the high‑priority queue is empty and the group is not busy.

2.4 Worker Thread

Workers perform the actual work. When a worker blocks on I/O or locks, the pool may wake or create another worker if the group is not already saturated. To avoid rapid worker creation, the pool limits the rate of new worker creation based on the number of connections in the group.

If the number of active workers exceeds too_many_active_threads+1 , the pool either converts a worker to a listener (if none exists) or lets the worker sleep, attempting to fetch a pending event via epoll_wait . If no event is available, the worker sleeps and is destroyed after threadpool_idle_timeout seconds of inactivity.

2.5 Timer Thread

The timer thread runs every threadpool_stall_limit milliseconds, scanning all thread groups. If a group has pending events but no new events have been consumed, it is considered stalled, and the timer wakes or creates workers to relieve the stall. The timer also terminates connections idle longer than wait_timeout seconds.

Part 3: TXSQL Dynamic Thread‑Pool Optimization

Thread pools work well for OLTP workloads but can degrade performance for long‑running queries because workers become blocked. Switching between Per_thread and Thread_pool modes currently requires a server restart, which is disruptive during peak traffic. TXSQL introduces a dynamic switch that enables enabling or disabling the thread pool without restarting.

3.1 Implementation of Dynamic Thread‑Pool

The MySQL thread_handling variable determines the connection‑management method. Historically it was read‑only; TXSQL makes it mutable at runtime by initializing all connection‑handling implementations at startup, allowing both Per_thread and Thread_pool to coexist.

Key challenges include:

Switching active connections: after a command finishes ( do_command ), the system checks if thread_handling changed and migrates the connection to the appropriate group.

Handling new connections: Connection_handler_manager::process_new_connection reads the current thread_handling and creates either a handle_connection thread (Per_thread) or assigns the connection to a thread group (Thread_pool).

Fast effect of switches: if a connection is idle waiting for a command, it may need to be forced to reconnect so the new handling method takes effect.

Part 4: TXSQL Thread‑Pool Load‑Balancing Optimization

Simple modulo assignment does not consider actual load, leading to imbalance. TXSQL proposes load‑balancing based on measured group load.

4.1 Load Measurement

Possible metrics include:

queue_length : total length of high‑ and low‑priority queues.

average_wait_usecs_in_queue : average waiting time of recent events.

group_efficiency : ratio of processed events to total events over a time window.

4.2 Load‑Balancing Implementation

Balancing is triggered only when a group's load exceeds a threshold M and its load ratio to neighboring groups exceeds N . Rather than a global load sequence (which would require costly locking), the algorithm compares a busy group only with its immediate neighbors and migrates connections to less‑loaded groups.

Two migration methods are used:

Optimized assignment of new connections to avoid busy groups.

Transfer of existing connections after request processing ( threadpool_process_request ) before re‑attaching the socket to epoll.

Part 5: TXSQL Thread‑Pool Disconnection Optimization

When a client disconnects before the listener re‑registers the socket with epoll, the server may keep the connection alive, consuming resources and potentially exhausting the connection limit.

TXSQL's solution:

Immediately monitor disconnection events after a normal network event.

Make connection termination asynchronous: place exiting connections into quit_connection_queue .

When a disconnection event is detected, set thd->killed to THD::KILL_CONNECTION and enqueue the connection for asynchronous cleanup.

The listener processes quit_connection_queue at fixed intervals (e.g., every 100 ms).

Part 6: Thread‑Pool Parameters and Status Variables

Key configuration parameters:

Parameter

Description

Default

Valid Range

thread_pool_idle_timeout

Maximum idle time (seconds) before a worker is destroyed

60

(1, UINT_MAX)

thread_pool_oversubscribe

Maximum number of workers allowed per group

3

(1, 1000)

thread_pool_size

Number of thread groups (usually CPU core count)

CPU cores

(1, 1000)

thread_pool_stall_limit

Timer interval (ms) for stall checking

500

(10, UINT_MAX)

thread_pool_max_threads

Total number of workers allowed

100000

(1, 100000)

thread_pool_high_prio_mode

High‑priority mode: transactions / statement / none

transactions

transactions/statement/none

thread_pool_high_prio_tickets

Ticket count per connection in transaction mode

UINT_MAX

(0, UINT_MAX)

threadpool_workaround_epoll_bug

Whether to work around the Linux 2.x epoll bug

no

no/yes

TXSQL adds a SHOW THREADPOOL STATUS command that reports per‑group statistics such as groupid , connection_count , thread_count , havelistener , active_thread_count , queue sizes, event counts, migration counts, and latency metrics.

Part 7: Summary

This article introduced the background, principles, architecture, implementation details, configuration parameters, and status variables of the Percona thread pool, and briefly covered TXSQL's dynamic activation, load‑balancing, and fast‑disconnection optimizations.

Part 8: References

1. https://www.percona.com/blog/2013/03/16/simcity-outages-traffic-control-and-thread-pool-for-mysql/

2. https://dbaplus.cn/news-11-1989-1.html

3. https://www.percona.com/doc/percona-server/5.7/performance/threadpool.html

4. https://mariadb.com/kb/en/thread-pool-in-mariadb/

Tencent Database Technology Team supports internal services such as QQ Space, WeChat Red Packets, Tencent Ads, Tencent Music, Tencent News, and external products on Tencent Cloud like TDSQL‑C, TencentDB for MySQL, CTSDB, MongoDB, CES, etc. The team focuses on continuous kernel and architecture optimization to provide reliable database services.
load balancingMySQLthread poolPerconaDatabase PerformanceTXSQLconnection handling
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.