Databases 21 min read

Redis Service Latency Diagnosis and Optimization – A Systematic Approach

The article outlines a systematic three‑step workflow—general service diagnostics, Redis‑specific checks, and reproducible load testing—to pinpoint a hot‑key‑driven CPU bottleneck, then evaluates mitigation options such as read‑write separation, pipelining, and an application‑level cache, ultimately showing the cache’s effectiveness in cutting latency and CPU usage.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Redis Service Latency Diagnosis and Optimization – A Systematic Approach

This article presents a systematic method for diagnosing and optimizing Redis latency issues encountered in a production module. The author first outlines three key focus areas: general service troubleshooting, Redis‑specific troubleshooting, and methods for reproducing and testing the problem.

1. General Service Troubleshooting – The author recommends a two‑step approach: first use business‑level metrics (instrumentation) to narrow down the problematic component, then employ performance‑analysis tools (e.g., pprof) for precise pinpointing. Emphasis is placed on checking basic resource metrics such as CPU, memory, network I/O, and disk I/O, as well as confirming that the issue is not caused by recent deployments.

2. Redis‑Specific Troubleshooting – The investigation follows a layered checklist:

Network latency between the client and Redis node (using intrinsic‑latency and latency‑history commands).

Redis internal latency (redis‑cli --intrinsic‑latency, --latency‑history).

Throughput and command statistics (INFO STATS, INFO MEMORY, INFO CPU, INFO REPLICATION).

Memory usage and fragmentation (used_memory_rss_human, used_memory_peak_human, mem_fragmentation_ratio).

Persistence and eviction settings (maxmemory, maxmemory‑policy, evicted_keys).

Key‑space size (INFO KEYSPACE) and the presence of big keys (redis-cli --bigkeys).

Hot‑key detection (Redis ≥ 5.0 hotkey command).

Key observations from the monitoring data include high CPU usage (~90 %) while OPS remain modest, no significant network or disk bottlenecks, and the absence of big keys, memory fragmentation, or eviction spikes.

3. Reproducing and Testing the Issue – The author creates a local demo to verify that pipeline or Lua scripting can reduce network round‑trips, then builds a full‑stack load test using Kafka (kaf) to simulate the production traffic. By increasing the pressure on the service, the hot‑key effect becomes evident: a single hot key drives CPU consumption and blocks other requests.

After confirming the hot‑key root cause, three mitigation strategies are discussed:

Read‑write separation across multiple Redis instances (if a multi‑instance setup is possible).

Batch writes using pipelines to reduce per‑command overhead.

Introduce an additional caching layer in the application to offload hot‑key traffic.

The chosen solution – adding an application‑level cache – successfully reduces both latency and CPU usage, as shown by the post‑mitigation monitoring graphs.

Overall, the article demonstrates a practical, data‑driven workflow for diagnosing Redis performance problems, combining resource metrics, Redis‑specific commands, and realistic load testing.

# Sample Redis INFO snippet used in the analysis
# 从Redis上一次启动以来总计处理的命令数
total_commands_processed:2255
instantaneous_ops_per_sec:12
total_net_input_bytes:34312
total_net_output_bytes:78215
instantaneous_input_kbps:1.20
instantaneous_output_kbps:2.62
monitoringperformanceCacheRedislatencyTroubleshooting
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.