Cloud Native 22 min read

Project Eru: Scaling a Custom Docker Orchestration Platform to 10k Nodes

Project Eru, a homegrown Docker‑based orchestration system developed at Mango TV, replaces earlier PaaS attempts with a stateless, scalable core and agent architecture, leveraging Redis clusters, MacVLAN networking, and fine‑grained CPU allocation to achieve rapid, automated scaling across thousands of containers.

Efficient Ops

Jun 17, 2015

Project Eru: Scaling a Custom Docker Orchestration Platform to 10k Nodes

Background

The discussion originated from a weekly "Operations Talk" group where experts share experiences about large‑scale infrastructure. The speaker, formerly of Douban App Engine, describes how difficulties with Python runtime isolation and dependency conflicts led to an interest in Docker.

Early Docker Experiments

Initial attempts involved modifying CPython and sys.path, which proved costly. The team instead split runtime dependencies into separate packages to minimize contamination. A diagram (shown below) illustrates the early dependency‑splitting approach.

From NBE to Project Eru

After a first‑generation PaaS called NBE (Nebulium Engine) that used Docker for isolation, the team recognized limitations in resource control and scaling. In late 2014 they revisited concepts from Borg and Omega, launching the second‑generation platform—Project Eru—designed as a service‑orchestration and scheduling system rather than a traditional PaaS.

Eru can run both offline and online services, allocate CPU in fine‑grained increments (e.g., 0.1, 0.01 cores), and use Redis as a message bus to monitor container states.

Core and Agent Architecture

Eru consists of two loosely coupled components:

Agent : runs on each host, reports container status, and performs low‑level operations (e.g., veth management) via a private Redis Cluster.

Core : a stateless logical core that controls Docker daemons across hosts and interacts with Agents.

Networking Choice: MacVLAN

After evaluating tunnel‑based solutions (Weave, OVS) and routing‑based solutions (Calico, MacVLAN), the team selected MacVLAN for its performance, simplicity, and ability to apply layer‑2 QoS and security policies.

Storage Strategy

The platform primarily uses devicemapper for container storage, with a smaller portion using OverlayFS. Tests showed OverlayFS offers better performance for small files, though its atomicity differs from devicemapper.

Resource Allocation and Scaling

CPU is the primary scheduling dimension. Each container receives a “fragment” core (e.g., 0.1 CPU) and a share of a full core, allowing elastic usage. Memory is allocated proportionally to host capacity (e.g., 0.5 CPU and ~1 GB per Redis container). Scaling decisions are delegated to business teams via monitoring data stored in InfluxDB (later migrated to Open‑Falcon) and custom APIs.

"Who cares, does it" – the platform follows a "who monitors, who decides" principle, exposing APIs for dynamic scaling without imposing rigid policies.

Service Discovery and Security

Containers within the same logical subnet are reachable via an internal DNS built on Dnscache and Skydns. Firewall rules are applied at layer‑2, ensuring that only containers in the same subnet can communicate, providing a simple security model.

Redis clusters are exposed through Eru’s broadcasting mechanism; scaling actions trigger API calls that automatically add or remove instances, achieving near‑millisecond response times.

Performance Highlights

In tests with 10 000 hosts, a full scheduling decision completes in about one second. The system also supports a “Public Server” mode that monitors macro‑level host resources without binding specific CPU or memory, useful for CI pipelines and image builds.

Conclusion

Project Eru demonstrates how a custom, stateless, Docker‑centric orchestration platform can achieve large‑scale, fine‑grained resource management while remaining flexible enough for diverse business needs. All source code is publicly available on GitHub.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Redis resource scheduling container orchestration Macvlan

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.