Databases 10 min read

Case Study: Scaling Redis with Twemproxy and Optimizing Connection Pools

The team rescued a crashing Redis cache handling 100K‑120K QPS by shrinking Nginx Lua timeouts, trimming connection pools, adding four Redis nodes behind Twemproxy, and splitting keys to raise cardinality, which eliminated connection spikes, balanced shard load, and restored stable performance.

37 Interactive Technology Team

Jul 28, 2017

Case Study: Scaling Redis with Twemproxy and Optimizing Connection Pools

Background

Project A collects data generated by other projects, stores it for a limited time without persistence, using a single Redis instance as cache. During peak periods the Redis QPS reached 100K, even 120K, and eventually the instance crashed, illustrating Murphy's Law. The incident highlights the need for proactive operations monitoring.

Analysis and Solution

2.1 Preliminary Analysis

The crash manifested as inability to establish new connections, severe timeouts, and data read failures. System logs showed many "kernel: Possible SYN flooding on port xxxx. Sending cookies" messages, indicating the Redis instance was consuming connections. At the time, Redis had about 7K connections and QPS of 100K. The team investigated whether Redis pipelining could help.

The application uses Nginx Lua with the lua-resty-redis client. The relevant configuration is: lua set_keepalive(5000, 20) The first argument is the max idle timeout, the second is the pool size. The total connection count can be estimated as:

connectionNum = machineNum × nginxWorkerProcess × pool_size

With four web servers, each running 18 Nginx workers, and pool_size 20, the theoretical connections are 4×18×20 = 1440, far less than the observed 7K, indicating other factors.

The team considered using Redis Pipeline, but most commands are INCR (+1), so pipeline gains were limited.

2.2 Horizontal Scaling with Twemproxy

To scale out, four additional Redis instances were added and a Twemproxy (nutcracker) proxy was deployed on each web server. Twemproxy provides sharding at the proxy layer, simplifying client logic.

However, QPS remained high and the local Twemproxy degraded web server performance. Data skew was observed: some Redis nodes handled up to 80K QPS while others only ~5K.

The root cause was identified as a 60‑second timeout in Nginx Lua, leading to many lingering connections.

2.3 Problem Resolution

The team reduced the timeout to 2 seconds and adjusted the connection pool. Connections dropped to around 1K and socket usage decreased markedly.

To address data skew, they increased key cardinality. Instead of a single key per minute, they generated 10+ keys per minute, distributing load across shards. After key splitting, QPS was balanced and web performance improved.

Principle Discussion

The author briefly analyzes Twemproxy’s consistent‑hash function. Twemproxy supports several hash algorithms (fnv1a_64, etc.) and distribution modes (ketama, modula, random). In the production setup, fnv1a_64 with ketama is used.

A Python implementation of the fnv1a_64 hash is shown:

hval = FNV_64_INIT
for c in s:
    hval = hval ^ ord(c)
    hval = (hval * FNV_64_PRIME) % UINT32_MAX
return hval

Higher key repetition leads to higher probability of being routed to the same shard, causing skew. Ketama has O(log N) complexity, while modula is O(1).

Case Summary

This case illustrates three key take‑aways: systematic problem‑identification, Redis bottleneck mitigation, and scaling‑out analysis. When encountering performance limits, consider code and server optimizations, product selection, data flow redesign, and appropriate scaling strategy (scale‑out vs. scale‑up).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Redis Connection Pool Scaling Twemproxy nginx lua

Written by

37 Interactive Technology Team

37 Interactive Technology Center

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.