Cloud Computing 12 min read

Google’s Andromeda Network Architecture: Design Goals, Control Plane, and Data Plane

The article reviews Google’s Andromeda network system presented at NSDI 2018, detailing its large‑scale design goals, a hybrid control‑plane model, high‑performance software data‑plane optimizations, hot‑cold flow separation, and rapid weekly updates that enable pure‑software SDN at cloud scale.

Cloud Native Technology Community
Cloud Native Technology Community
Cloud Native Technology Community
Google’s Andromeda Network Architecture: Design Goals, Control Plane, and Data Plane

Recently the author read a 2018 NSDI paper describing Google Compute Platform’s overall network design, known as Andromeda. The paper explains the massive and rapidly changing network challenges GCP faces and how its control‑plane and data‑plane were redesigned to support up to 100,000 VMs per network, achieve 30 Gb/s throughput, and process three million packets per second per core, while allowing online migration and weekly software upgrades.

Unlike AWS, which offloads all network functions to dedicated hardware cards, and Azure, which uses FPGA acceleration, GCP relies mainly on generic Intel offload capabilities and implements flow‑table lookup, packet forwarding, ACL, load‑balancing, and NAT entirely in software, representing one of the largest pure‑software SDN deployments.

Andromeda’s design goals include:

Control‑plane scalability and isolation: supporting 100 k VMs per network with a 186 ms change propagation latency, and ensuring multi‑tenant isolation.

Data‑plane high performance and isolation: delivering high throughput and low latency while preventing one tenant from monopolizing bandwidth.

Rapid iteration: enabling weekly online upgrades of the data plane, a capability rarely seen in traditional network infrastructures.

The control plane distributes virtual network topology to each host, effectively providing an address book for VM‑to‑VM routing. Three traditional models—Preprogrammed (full‑mesh), On‑Demand, and Gateway—are discussed, each with trade‑offs in scalability, latency, and resource usage.

GCP adopts a hybrid “Hoverboard” model that combines Gateway and On‑Demand approaches. By analyzing traffic patterns (e.g., 83 % of VM pairs never communicate, 98 % of flows stay below 20 kbps, and 2 % of VM pairs consume 99.5 % of bandwidth), the system keeps hot flows on the host for direct forwarding and offloads cold flows to a special gateway, dramatically reducing flow‑table size and improving scalability.

On the data‑plane side, GCP separates a Fast Path for pure forwarding—using techniques similar to OVS‑DPDK such as userspace datapath, busy polling, lock‑free queues, hugepages, and batch processing—to achieve ~300 ns per packet, and a Coprocessor Path for more complex functions. The two paths interact via flow‑table rules, allowing complex processing without hurting the fast path’s performance.

The paper also describes online migration of the data plane, where new and old datapaths run concurrently during upgrades, limiting service pause to about 270 ms and enabling weekly updates.

Overall, Andromeda’s hot‑cold separation, control‑plane scalability, and rapid software iteration provide valuable lessons for large‑scale cloud networking, while contrasting with the more monolithic approach of OVN and highlighting emerging alternatives such as FPGA acceleration, smart NICs, and eBPF‑based kernel bypass.

network architectureSDNCloud NetworkingControl PlaneData PlaneAndromedaGCP
Cloud Native Technology Community
Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.