Local Cache Optimization for Outbox Redis in a High‑Traffic Feed Stream Service
To protect the outbox Redis cluster from extreme read amplification during hot events, the service adds a resident local cache for hot creators’ latest posts, using a threshold‑based list, change‑broadcast updates, and checksum verification, which achieved over 55% cache hits and cut peak Redis load by roughly 44% and CPU usage by 37%.
The feed‑stream service experiences severe load spikes during hot events, causing the outbox Redis cluster to reach CPU utilizations of 95%–100% and leading to timeouts for many users.
Analysis reveals that the system uses a "push‑pull" model: each user has an inbox (inbox) and an outbox. The outbox stores the user’s own posts and is read heavily for large‑follower ("big‑fan") creators, resulting in massive read amplification (hundreds to thousands of times) on the outbox Redis cluster.
Two mitigation ideas were evaluated. Increasing the threshold for "push" processing would reduce outbox reads but would dramatically increase inbox write pressure and storage costs. The chosen solution is to introduce a local cache for hot creators’ latest posts, which adds no hardware cost and can be deployed quickly.
The design defines a follower‑count threshold to identify hot creators (UPs). Daily offline statistics (T+1) compute the list of creators that meet or fall below the threshold; the list is pushed to Redis via Kafka. When an outbox service instance starts, it loads the full hot‑creator list and pulls their latest post lists from the outbox Redis into a resident local cache. Subsequent updates to the hot‑creator list trigger incremental cache updates. Feed requests first consult the local cache; only cache misses fall back to the outbox Redis, greatly reducing read amplification.
To avoid a cache‑stampede when rebuilding data, the solution uses a "change broadcast + asynchronous reconstruction" pattern. When a cached creator publishes or deletes a post, a change event is broadcast to all outbox instances, which then asynchronously fetch the updated list from Redis and refresh their local cache. Because hot‑creator changes are rare, this approach imposes minimal load on Redis.
Consistency between the local cache and Redis is ensured through two mechanisms: (1) after each reconstruction, the checksum of the creator’s post list is stored in Redis; (2) a periodic inspection compares cached checksums with Redis checksums, and any mismatch triggers a rebroadcast of the change event for automatic repair.
After deployment, the local cache achieved a hit rate exceeding 55%. Compared with the pre‑deployment baseline, the weekend peak load on outbox Redis dropped by roughly 44%, and the CPU usage peak per Redis instance decreased by 37.2%.
Future work includes extending the cache to hot creators of other content types (e.g., anime, paid videos) and adding real‑time hot‑key detection for creators that become hot suddenly, further improving cache coverage and protecting Redis from overload.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.