Artificial Intelligence 4 min read

Mooncake: Open-Source KVCache-Centric Large Model Inference Architecture Co-Developed by Alibaba Cloud and Tsinghua University

In June 2024, Alibaba Cloud and Tsinghua University's MADSys Lab announced the open‑source Mooncake architecture, a KVCache‑centered large‑model inference framework that boosts throughput, lowers cost, and standardizes resource‑pooling techniques for high‑performance AI inference across industry and academia.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mooncake: Open-Source KVCache-Centric Large Model Inference Architecture Co-Developed by Alibaba Cloud and Tsinghua University

In June 2024, Alibaba Cloud and Tsinghua University's MADSys Lab announced Mooncake, a KVCache‑centric large‑model inference architecture that significantly improves inference throughput while reducing costs, attracting broad industry attention.

As part of the AIR (Alibaba‑Tsinghua Innovation Research) project, the collaboration explores industrial applications of model resource‑pooling technology, standardizing a cache‑pooling layer and implementing an efficient distributed resource‑decoupling architecture that optimizes long‑context inference for large models.

Alibaba Cloud contributed code to key components such as the Transfer Engine, peer‑to‑peer storage (P2P Store), and high‑performance memory storage, integrating Mooncake with the widely used vLLM inference framework to achieve substantial performance gains and providing reference implementations for other frameworks; the Transfer Engine leverages Alibaba's self‑developed eRDMA network and plans CXL support for rapid, scalable cloud deployment.

Professor Zhang Mingxing of the MADSys Lab emphasized that Mooncake fully utilizes CPU, memory, and SSD resources in AI infrastructure, enables cache sharing across inference instances, reduces resource waste, and that open‑sourcing the project aims to foster academia‑industry collaboration to accelerate large‑model inference system development.

Alibaba Cloud will deepen its participation in Mooncake, partnering with more enterprises, institutions, and universities to explore more efficient and advanced model inference system architectures, with the open‑source repository available at https://github.com/kvcache-ai/mooncake .

large-model-inferenceopen sourceAlibaba CloudAI infrastructureTsinghua UniversityKVCache
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.