Root Cause Analysis and Optimization of Slow SQL in Meituan‑Dianping MySQL Thread‑Pool
This article presents a detailed investigation of intermittent slow SQL queries observed in Meituan‑Dianping’s services, identifies the MySQL 5.6 thread‑pool as the primary cause, and demonstrates how adjusting thread‑pool parameters dramatically reduced slow‑query counts and improved overall service availability.
Author Tian Dongxue, a database architect at Meituan‑Dianping, shares a real‑world case where occasional slow SQL queries caused timeouts and degraded QoS for a large internet‑plus life‑service platform.
Problem description : Developers reported high SQL latency (99th‑percentile exceeding 100 ms) despite the queries being simple primary‑key lookups on small tables, with no entries in the MySQL slow‑log.
Monitoring data from the CAT system showed many queries with response times over 100 ms, some reaching 929 ms, and a total of 3,788 slow queries on a single day.
Analysis : By capturing timestamps at four points (APP server send, MySQL receive, MySQL send, APP receive) using TcpDump, the majority of latency was traced to the MySQL server itself (≈606 ms). Further investigation revealed that the slow queries were not logged, indicating the delay occurred before the storage engine phase, likely in the thread‑handling layer.
MySQL 5.6 introduced a thread‑pool mechanism. Examination of thread‑related parameters (thread_cache_size, thread_pool_* etc.) suggested the thread‑pool was the culprit.
Solution iterations :
Disable the thread‑pool (thread_handling=one‑connection‑per‑thread) – reduced slow queries from 3,788 to 818.
Fine‑tune thread‑pool parameters, especially thread_pool_stall_limit , lowering it to 10 ms – further reduced slow queries to 63.
These adjustments eliminated over 98 % of the slow SQL incidents and restored service latency to sub‑10 ms levels.
Results :
State
Slow SQL count
Remarks
Before optimization
3788
After disabling thread‑pool
818
After setting thread_pool_stall_limit=10
63
The findings demonstrate a systematic approach to diagnosing end‑to‑end database performance issues: isolate the latency source, use exclusion methods, and adjust low‑level MySQL parameters.
Takeaways : Properly configuring MySQL thread‑pool settings is critical for high‑concurrency environments; monitoring tools and packet captures are essential for pinpointing bottlenecks.
References :
https://my.oschina.net/andylucc/blog/820624
https://yq.aliyun.com/articles/41078
http://blog.chinaunix.net/uid-28364803-id-3427833.html
http://blog.chinaunix.net/uid-28364803-id-3431242.html
https://www.percona.com/doc/percona-server/5.6/performance/threadpool.html
https://mariadb.com/kb/en/mariadb/thread-pool-in-mariadb/
https://www.safaribooksonline.com/library/view/high-performance-mysql/9781449332471/ch01.html#mysqlas_logical_architecture
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.