Understanding Distributed Caching: Principles and Implementation of Memcached
This article explains the fundamentals of caching, the role of distributed caches like memcached in high‑concurrency environments, and details the algorithms—including remainder hashing and consistent hashing with virtual nodes—that enable memcached to distribute data across multiple servers efficiently.
Author Introduction
Lu Chen (Float Lu) is a Java engineer at Meituan-Dianping, an open‑source enthusiast who enjoys exploring foundational technologies and middleware in his spare time. Blog: http://my.oschina.net/andylucc.
Article
In high‑concurrency scenarios, massive read/write requests overwhelm databases, making disk I/O a bottleneck and causing high response latency; caching therefore becomes essential. Whether single‑node or distributed, caches have distinct use cases and trade‑offs, with Redis and Memcached being the most common distributed cache products. This article focuses on the distributed implementation principles of the cache service Memcached.
1. The Essence of Cache
Computer Architecture Cache
Based on the von Neumann model, a computer consists of CPU, control unit, memory, input and output devices. Modern CPUs contain registers and multiple cache levels (L1, L2, L3) before main memory. For example, a typical PC may have 356 GB disk, 4 GB RAM, 3 MB L3 cache, and 256 KB L2 cache. Data is first sought in the nearest (fastest) cache layer, which is the most expensive per byte.
Storage pyramid
All storage levels above the slowest (disk) are considered caches, accelerating data access.
Cache in Application Systems
Application data flow follows: Cache → DB . The application first queries the fast cache; if the data is present and valid, it returns immediately. Otherwise, it falls back to the slower database, returns the result, and updates the cache.
Typical cache‑augmented storage access model
2. Introduction to Memcached
What is Memcached
Memcached, originally developed by Brad Fitzpatrick at Danga Interactive (LiveJournal), is a high‑performance distributed memory cache server used by many large services (e.g., Facebook, Mixi). It reduces database load, improves response speed, and enhances scalability.
Key Features of Memcached
Simple protocol
Event‑driven architecture based on libevent
In‑memory storage
Servers operate independently without inter‑communication
3. Distributed Principles of Memcached
Because Memcached servers do not communicate with each other, distribution is handled entirely by the client library.
Memcached distribution diagram
When a client receives data, it uses the key to select a Memcached server based on a hashing algorithm; the same algorithm is used for retrieval, ensuring that the data is stored and fetched from the same node.
Remainder (Modulo) Hashing
The standard method is:
CRC($key)%Nwhere N is the number of servers. Issues include rehashing on server failure and high reshuffling cost when servers are added or removed.
Consistent Hashing
Each server is assigned a hash value on a 0‑2³² ring; keys are also hashed onto the same ring. A key is stored on the first server encountered clockwise. This limits the impact of server changes to a small segment of the ring.
Consistent hashing principle
Optimized Consistent Hashing with Virtual Nodes
To avoid uneven key distribution, each physical server is represented by multiple virtual nodes, each placed at a different point on the ring. Keys map to virtual nodes, which then map to the underlying physical server, achieving a more balanced load even with few physical nodes.
4. Summary
After covering basic cache concepts, this article introduced Memcached’s distributed algorithms, showing that distribution is achieved by client‑side hashing techniques such as remainder hashing, consistent hashing, and its virtual‑node‑enhanced variant.
References: 《Large‑Scale Distributed Web Architecture Design and Practice》, 《Comprehensive Analysis of Memcached》.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.