Practical Guide to Performance Testing and Troubleshooting in Linux Environments
This article outlines a comprehensive, step‑by‑step approach to performance testing and root‑cause analysis for backend services, covering environment validation, tool selection, Linux system limits, dependency checks, empty‑endpoint verification, throughput calculation, log monitoring, and essential Linux commands such as netstat, vmstat, mpstat, iostat, top and free.
Background
The author shares experiences from two rounds of performance testing, emphasizing the need for regular load testing before major promotions or new system releases to ensure system capacity and stability.
Automated performance testing environments should include CI/CD pipelines, isolated test clusters, automated tools, alerting, reporting, and a clear troubleshooting workflow to avoid drift between test and production configurations.
Environment Checks
Before any load test, verify hardware (CPU, memory, network bandwidth, SSD), network topology (cross‑segment connectivity, bandwidth limits), and middleware (deployment, configuration, clustering, benchmark vs. business‑scenario tests).
When middleware appears functional but cannot handle load, perform simple benchmarks such as single‑table DB inserts, concurrent cache reads, or MQ writes to understand internal bottlenecks.
Load‑Generator and Tool Inspection
Common load generators include locust , jmeter , and ab . ab is primarily for baseline testing, while jmeter is used for broader acceptance testing.
Ensure the load generator resides on the same network segment as the target and that no bandwidth throttling exists. For jmeter , verify Java runtime settings.
During long jmeter runs, disable the "Listener → Graph Results" panel to avoid UI freezes that can be mistaken for memory issues.
Linux Open Files Limit Configuration
Check the current file‑descriptor limits with ulimit -a or ulimit -n . If adjustments are needed, edit /etc/security/limits.conf .
Dependency Inspection
Map all service dependencies (databases, caches, message queues) and determine whether they need to be included in the test or can be stubbed out. Create a git branch for the performance test and adjust code accordingly.
Empty‑Endpoint Load Test
Validate network connectivity and basic server health by load‑testing a simple endpoint (e.g., /health_check ) that has no downstream dependencies.
Throughput Calculation in Aggregated Reports
Throughput is calculated as samples / test duration . For write‑type APIs it represents TPS, for read‑type APIs it represents QPS. Misinterpreting a decreasing throughput can hide underlying service failures.
Performance Investigation Methods
Adopt a top‑down or bottom‑up approach: start from system‑level metrics (CPU, memory, network) and drill down to application logs, thread dumps, and JVM statistics.
Log Monitoring Across Dimensions
Collect request latency logs, external‑call logs, and middleware‑specific logs (e.g., mq.log , cache.log , search.log ). For Java applications, pay special attention to GC logs and understand the impact of different GC phases.
Common Linux Commands for Monitoring
netstat
Inspect network sockets and connection counts.
netstat -tnlp | grep ip | wc -l
vmstat
Monitor virtual CPU queues and memory swapping.
vmstat 1
mpstat
Gather per‑CPU statistics on multi‑core systems.
mpstat -P ALL 1
iostat
Observe I/O throughput and device utilization.
iostat 1
top
Real‑time view of CPU, memory, load average, and task states.
top
free
Display total, used, and free memory, including buffers/cache.
free -m
Conclusion
The article provides a practical checklist for typical performance testing scenarios, emphasizing systematic environment validation, tool configuration, metric collection, and iterative troubleshooting, while acknowledging that deep performance issues often require extensive analysis and no single silver bullet exists.
Hujiang Technology
We focus on the real-world challenges developers face, delivering authentic, practical content and a direct platform for technical networking among developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.