Databases 21 min read

Redis Service Latency Diagnosis and Optimization – A Systematic Approach

The article outlines a systematic three‑step workflow—general service diagnostics, Redis‑specific checks, and reproducible load testing—to pinpoint a hot‑key‑driven CPU bottleneck, then evaluates mitigation options such as read‑write separation, pipelining, and an application‑level cache, ultimately showing the cache’s effectiveness in cutting latency and CPU usage.

Tencent Cloud Developer

Sep 28, 2023

Redis Service Latency Diagnosis and Optimization – A Systematic Approach

This article presents a systematic method for diagnosing and optimizing Redis latency issues encountered in a production module. The author first outlines three key focus areas: general service troubleshooting, Redis‑specific troubleshooting, and methods for reproducing and testing the problem.

1. General Service Troubleshooting – The author recommends a two‑step approach: first use business‑level metrics (instrumentation) to narrow down the problematic component, then employ performance‑analysis tools (e.g., pprof) for precise pinpointing. Emphasis is placed on checking basic resource metrics such as CPU, memory, network I/O, and disk I/O, as well as confirming that the issue is not caused by recent deployments.

2. Redis‑Specific Troubleshooting – The investigation follows a layered checklist:

Network latency between the client and Redis node (using intrinsic‑latency and latency‑history commands).

Redis internal latency (redis‑cli --intrinsic‑latency, --latency‑history).

Throughput and command statistics (INFO STATS, INFO MEMORY, INFO CPU, INFO REPLICATION).

Memory usage and fragmentation (used_memory_rss_human, used_memory_peak_human, mem_fragmentation_ratio).

Persistence and eviction settings (maxmemory, maxmemory‑policy, evicted_keys).

Key‑space size (INFO KEYSPACE) and the presence of big keys (redis-cli --bigkeys).

Hot‑key detection (Redis ≥ 5.0 hotkey command).

Key observations from the monitoring data include high CPU usage (~90 %) while OPS remain modest, no significant network or disk bottlenecks, and the absence of big keys, memory fragmentation, or eviction spikes.

3. Reproducing and Testing the Issue – The author creates a local demo to verify that pipeline or Lua scripting can reduce network round‑trips, then builds a full‑stack load test using Kafka (kaf) to simulate the production traffic. By increasing the pressure on the service, the hot‑key effect becomes evident: a single hot key drives CPU consumption and blocks other requests.

After confirming the hot‑key root cause, three mitigation strategies are discussed:

Read‑write separation across multiple Redis instances (if a multi‑instance setup is possible).

Batch writes using pipelines to reduce per‑command overhead.

Introduce an additional caching layer in the application to offload hot‑key traffic.

The chosen solution – adding an application‑level cache – successfully reduces both latency and CPU usage, as shown by the post‑mitigation monitoring graphs.

Overall, the article demonstrates a practical, data‑driven workflow for diagnosing Redis performance problems, combining resource metrics, Redis‑specific commands, and realistic load testing.

# Sample Redis INFO snippet used in the analysis
# 从Redis上一次启动以来总计处理的命令数
total_commands_processed:2255
instantaneous_ops_per_sec:12
total_net_input_bytes:34312
total_net_output_bytes:78215
instantaneous_input_kbps:1.20
instantaneous_output_kbps:2.62

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Cache Redis Latency Troubleshooting

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.