How ZCube Redefines 20‑Year‑Old Networking Logic to Boost GPU Throughput by 15%
ZCube, a new flat networking architecture deployed by Zhipu in its GLM‑5.1 inference cluster, eliminates structural congestion, delivering a 15% throughput gain, 40.6% latency reduction, and one‑third lower hardware cost without adding GPUs, signaling a shift from raw compute to system efficiency in AI infrastructure.
