Cloud Native 10 min read

How Netflix’s EVCache Enables Global Low‑Latency Caching at Massive Scale

The article explains Netflix’s EVCache—a cloud‑native, memcached‑based distributed cache that provides low‑latency, high‑reliability data access across multiple regions using asynchronous Kafka‑driven replication, detailing its architecture, performance optimizations, and remaining challenges.

Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
How Netflix’s EVCache Enables Global Low‑Latency Caching at Massive Scale

Overview

Netflix runs hundreds of micro‑services, each stateless and focused on a single responsibility, which makes the system loosely coupled and easy to scale. State is kept in caches or persistent storage, and EVCache is the caching layer designed specifically for this environment.

EVCache Design and Cloud Optimization

EVCache is an open‑source, memcached‑based in‑memory store optimized for cloud deployments. It targets scenarios that do not require strong consistency, handling up to 30 million requests per second, storing billions of objects across thousands of servers, and processing roughly 2 trillion requests daily.

Cross‑Region Replication Mechanism

Replication is triggered after a set() call on the EVCache client library. The client writes the key (metadata only) to a Kafka replication queue, and a local replication broadcaster reads the message, fetches the value from the local cache, and forwards a SET request to a remote replication agent. The remote agent writes the value to its local cache, making the update visible to subsequent GET calls.

This asynchronous, key‑only replication keeps Kafka traffic small and avoids storing full payloads, reducing latency and resource consumption.

Handling Cold Cache and Consistency

If a user switches regions, the new region may have a “cold cache” for that key, requiring a fallback to the database, which adds latency and load. EVCache mitigates this by replicating frequently accessed data globally and tolerating eventual consistency for non‑critical data, allowing short windows of divergence without impacting user experience.

Performance Optimizations

Targeted 99th‑percentile end‑to‑end replication latency under 1 second, typically around 400 ms after buffering and flushing.

Use of persistent TCP connections between broadcaster and agent eliminates three‑way handshakes and TLS session setup, improving latency and stability.

Batching multiple messages into a single TCP window maximizes throughput while keeping latency acceptable.

These measures enable the cross‑region EVCache system to sustain up to 1 million RPS and handle peak replication rates of 1.5 million messages per second.

Current Challenges

Kafka does not scale vertically easily; adding partitions and configuring consumers manually can cause duplicate or lost messages, affecting eventual consistency.

Failure of a remote EVCache instance increases latency because the proxy must retry writes; detection and coordination mechanisms are being added to the client.

Monitoring Kafka for lost or delayed messages requires comparing broker counts with broadcaster counts and setting alert thresholds, as occasional broker slowdowns can cause spikes in latency.

Conclusion

EVCache demonstrates a pragmatic, cloud‑native approach to global caching for micro‑service architectures, prioritizing low latency and high availability while accepting eventual consistency where appropriate. Ongoing work focuses on improving Kafka scalability, fault tolerance, and observability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeMicroservicesdistributed cachingNetflixEVCacheKafka replication
Art of Distributed System Architecture Design
Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.