Understanding Database Connection Pool Sizing: Lessons from HikariCP and Real‑World Performance Tests
The article explains why a small database connection pool combined with a large queue of waiting threads yields better performance, illustrates this with Oracle and PostgreSQL benchmark data, and provides a practical formula for calculating the optimal pool size based on CPU cores and effective disks.
While researching HikariCP, the author discovered a GitHub Wiki article that clarified many doubts about database connection pool sizing and decided to share a translation.
Configuring a database connection pool is a common source of pitfalls; several counter‑intuitive principles must be understood.
1 万并发用户访问
For a site handling roughly 10 000 concurrent users (≈20 000 TPS), the correct question is "How small should the database connection pool be?"
The author references an Oracle Real World Performance Group video (http://www.dailymotion.com/video/x2s8uec) that tests Oracle with 9 600 concurrent threads, each sleeping 550 ms between DB calls, initially using a middleware pool of 2 048 connections.
After the load test started, the performance looked like this:
With 2 048 connections, each request waited 33 ms in the pool queue and spent 77 ms executing SQL. The database showed heavy buffer busy waits and CPU usage around 95%.
When the pool size was reduced to 1 024 (keeping the same concurrency), the performance changed:
The wait time per request remained similar, but SQL execution time dropped, and overall throughput stayed constant while wait events halved.
Further reducing the pool to 96 connections (still 9 600 threads) yielded:
With 96 connections, the average queue wait was 1 ms and SQL execution averaged 2 ms, dramatically reducing wait events and increasing throughput; response time fell from ~100 ms to ~3 ms solely by shrinking the pool.
But why?
Even on a single‑core CPU, many threads appear to run simultaneously, but the OS merely time‑slices them; true parallelism is limited by core count, so exceeding core count with threads can degrade performance.
有限的资源
Database bottlenecks fall into three categories: CPU, disk, and network. Memory bandwidth is usually not the limiting factor. On an 8‑core server, setting connections equal to cores often yields optimal performance; adding more causes context‑switch overhead.
Disk I/O involves seek time and rotational latency, causing threads to block; during these blocks, the CPU can serve other threads, so a higher thread‑to‑core ratio can be beneficial when I/O is the bottleneck.
SSD storage eliminates seek latency, reducing blocking, so fewer threads (closer to core count) are optimal. Network I/O behaves similarly, with higher bandwidth reducing wait times.
PostgreSQL benchmark data shows TPS growth slowing after about 50 connections, reinforcing that overly large pools are counter‑productive.
计算公式
The following formula, provided by PostgreSQL, can be used as a starting point for most databases:
Connections = (Cores * 2) + Effective_Disks
Cores should not include hyper‑threads. If the workload is fully cached, Effective_Disks is 0; otherwise it approaches the actual number of disks. The formula’s behavior on SSDs is still unverified.
For a 4‑core i7 database server, the formula suggests a pool size of ((4 * 2) + 1) = 9, rounded to 10. Testing shows that exceeding this size quickly degrades response time and TPS.
Note: This formula also applies to many I/O‑bound programs; the author observed the same optimal thread count when testing a Netty‑based messaging service.
公理:你需要一个小连接池,和一个充满了等待连接的线程的队列
For 10 000 concurrent users, a 10 000‑size pool is absurd; even 1 000 is excessive. A small pool (around 10 connections) with a large queue of waiting business threads is optimal, matching the number of queries the DB can handle concurrently (usually ≤ 2 * CPU cores).
Many small web apps with only a dozen concurrent users still use 100‑connection pools, unnecessarily burdening the database.
请注意
The optimal pool size depends on system characteristics. Mixed long‑ and short‑transaction workloads may require separate pools. Task queues that limit concurrent execution should align their concurrency with the pool size, not the other way around.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.