Cloud Native 16 min read

From Computer Use to Datacenter Use: Enabling AI Agents to Drive Data Centers Like Function Calls

The article analyzes how AI agents require datacenter‑scale compute beyond a single virtual machine, explains why existing cloud‑native stacks cannot meet this demand, and details Ant Group's AKernel and openYuanrong solution—including three technical pillars, performance benchmarks, a tiny development team, and a streamlined deployment workflow that turns any developer into a "Build Your Own Cluster" operator.

AntTech
AntTech
AntTech
From Computer Use to Datacenter Use: Enabling AI Agents to Drive Data Centers Like Function Calls

Agent Era Core Trends

Two trends define the next stage of AI agents: Multi‑Agent , where a single agent is split into specialized roles (product manager, developer, tester) to collaborate, and Datacenter Use , where a sufficiently smart brain needs direct, instant access to massive compute resources rather than coordination among many roles.

An example of a stock‑trading agent shows that even a powerful model cannot process years of historical data without immediate, high‑throughput compute; Datacenter Use aims to provide that.

Building on this, the goal extends to "Build Your Own Cluster," allowing each agent or end‑user to dynamically locate low‑cost nodes across clouds and schedule clusters on demand.

Why Existing Infrastructure Fails

Traditional cloud‑native stacks evolved from 2015 container and Kubernetes adoption, achieving maturity for web traffic spikes and disaster recovery, but they suffer from three deep pain points:

Time cost : Large‑scale events like Double‑11 require weeks of preparation; agents need minute‑ or second‑level instant compute.

Architectural complexity : Vertical layering (IaaS → PaaS → SaaS) and horizontal domain separation (storage, compute, network, scheduling) cause cross‑layer coordination overhead, duplicated tech stacks, and heavy manual operations.

Human resource cost : Maintaining a full stack (Kubernetes, OS kernel, distributed storage, networking) typically needs 30‑50 engineers, making it infeasible to provision such infrastructure per agent.

Thus, conventional cloud‑native systems are mature but not designed for AI agents.

Core Solution: AKernel and "Build Your Own Cluster"

Ant Group built AKernel , a lightweight infrastructure that provides instant, on‑demand compute for upper‑level users. Its design goal is to let every developer "Build Your Own Cluster" across global compute resources, achieving true Datacenter Use.

Remarkably, the entire AKernel cluster is operated by fewer than three people, thanks to two core designs:

Monorepo + AI‑assisted development : All open‑source components (including openYuanrong) reside in a single repository; developers use AI assistants (e.g., Claude Code) to diagnose failures within minutes.

Architecture Walkthrough: A Full Request Journey

AKernel integrates three layers of components, color‑coded in the diagram (orange: openYuanrong core, green: community open‑source, blue: AKernel proprietary). The request flow for creating an Agent Sandbox is:

User issues a request via SDK/CLI; traffic enters through a public IP and Traefik gateway.

Gateway routes to the openYuanrong function system, which forwards to a central scheduler that selects an idle node.

Node‑side Proxy (similar to Kubernetes Kubelet) manages resources and invokes the Sandbox Daemon to create the sandbox.

The user’s code—whether a custom agent or FaaS user code—runs inside this sandbox.

When sandboxes need to share data, the request is routed to openYuanrong’s Data Worker, forming a distributed Data System that supports high‑speed memory‑based data exchange. External network traffic is handled by a custom eBPF‑based NAT component.

Three Technical Pillars

openYuanrong Distributed Scheduling & Data System : Provides a unified execution engine supporting C++, Python, Go, with a high‑performance distributed cache that enables fast data movement for RL training, multi‑agent state sharing, and Spark workloads.

AFaaS Extreme Cold‑Start Sandbox (OSDI'25) : Uses a clone‑based approach with NanoVisor (gVisor‑derived), distill‑fs (Rust FUSE lazy‑load), and Dragonfly P2P image acceleration, achieving millisecond‑level sandbox launch—two orders of magnitude faster than traditional containers.

Checkpoint/Restore Full‑Chain State Persistence : Captures complete sandbox state (memory, registers, file handles) into openYuanrong’s Data System, allowing automatic or manual snapshots, rapid restore, and state migration across agents, turning agents from short‑lived stateless tasks into long‑lived work units.

Developer Experience: Conversational Deployment in 10 Minutes

Deploying AKernel requires cloning the repository and invoking an AI assistant with a command such as “Help me deploy an AKernel cluster.” Providing cloud provider credentials triggers a fully automated workflow:

Terraform creates VPC, security groups, ACK/CCE clusters, and node pools.

Helm installs core components (openYuanrong, Dragonfly, monitoring, etc.).

Kubernetes schedules the final application deployment.

The end‑to‑end deployment measured 8 minutes 39 seconds, covering VPC creation, cluster provisioning, node pool initialization, and Helm deployment, after which Grafana and tracing URLs are returned automatically.

Practice Verification and Future Evolution

AKernel’s full capabilities (Agentic RL, serverless functions, Spark big‑data processing) are already deployed in multiple internal scenarios. Future plans include adding GPU/NPU support (e.g., Alibaba Cloud Hanguang NPU) and extending sandboxes to Windows, macOS, and Android.

Three concise takeaways summarize AKernel’s value:

Small teams can deliver massive infrastructure—few people accomplish what large teams used to.

In the Agent era, infrastructure must not limit AI agents; extreme optimizations unlock compute potential.

Developers can obtain code and instantly deploy clusters, truly achieving Datacenter Use.

When every AI agent can seamlessly harness datacenter‑scale compute and infrastructure barriers drop from dozens of engineers to a three‑person squad, the AI era will witness a paradigm shift in infrastructure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsCloud NativeAI agentssandboxAKerneldatacenter computeopenYuanrong
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.