Backend Development 9 min read

Performance Optimization of a High‑Concurrency Python Web Service

This article documents a Python web service performance optimization case, detailing the initial bottlenecks, architectural redesign with caching and message queues, load‑testing methodology, Linux TCP time‑wait tuning, and the final results achieving 50k QPS with sub‑70 ms latency.

Architecture Digest
Architecture Digest
Architecture Digest
Performance Optimization of a High‑Concurrency Python Web Service

The article records a Python program performance optimization process, describing the problems encountered and the solutions applied, while emphasizing that the presented method is not the only possible one.

How to Optimize – Optimization must be driven by clear requirements, measurable goals, and precise identification of performance bottlenecks; arbitrary concurrency claims without context are meaningless.

Requirement Description – The module, originally part of a main site, was split out to avoid affecting the core service under heavy load. Targets: QPS ≥ 30,000, database usage ≤ 50%, server CPU ≤ 70%, request latency ≤ 70 ms, error rate ≤ 5%.

Environment – Server: 4‑core, 8 GB RAM, CentOS 7, SSD. Database: MySQL 5.7, max connections 800. Cache: Redis 1 GB. Load‑testing tool: Locust with distributed elastic scaling.

Key Analysis – Three main tasks: (1) fetch appropriate popup configuration for each user, (2) record the next time the configuration should be returned, and (3) log user actions on the returned configuration.

Initial Optimization – All three tasks involve database reads and writes, which would saturate connections without caching. The first architectural diagram (shown below) illustrates the original design.

Write operations were moved to a FIFO message queue implemented with a Redis list to reduce direct DB writes.

First load test results: QPS ≈ 6,000, 502 errors rose to 30%, CPU 60‑70%, DB connections saturated (~6,000 TCP connections). The bottleneck was identified as frequent DB reads for configuration lookup.

After redesign (second diagram below), all configurations were cached; the database is queried only on cache miss.

Second test: QPS reached ~20,000, CPU 60‑80%, DB connections ~300, TCP connections ~15,000 per second. The limitation was not CPU or DB but TCP socket handling.

Investigation showed the system’s ulimit -n was 65,535, so file descriptor limits were not the cause. Increasing the limit to 100,001 gave a slight improvement (QPS ≈ 22,000) but did not solve the issue.

The root cause was that TCP connections remained in TIME_WAIT state after the four‑way handshake, preventing immediate reuse. The article explains that TIME_WAIT sockets linger to ensure all data is received.

Linux kernel parameters were tuned to reduce TIME_WAIT impact:

#timewait 的数量,默认是 180000。
net.ipv4.tcp_max_tw_buckets = 6000

#启用 timewait 快速回收。
net.ipv4.tcp_tw_recycle = 1

#开启重用。允许将 TIME-WAIT sockets 重新用于新的 TCP 连接。
net.ipv4.tcp_tw_reuse = 1

After applying these settings, the final load test achieved 50,000 QPS, CPU around 70%, normal DB and TCP connection counts, average response time 60 ms, and 0% error rate.

Conclusion – The end‑to‑end development, tuning, and testing process highlighted that web development is an interdisciplinary engineering practice involving networking, databases, programming languages, and operating systems, requiring solid fundamentals to diagnose and resolve performance issues.

Performance OptimizationPythonDatabasebackend developmentcachingload testingLinux Tuning
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.