Building an AI‑Native Multi‑Agent Digital Human Architecture on Cloud Native
The article details how a cloud‑native platform called AgentTeams enables AI‑Native multi‑agent digital‑human teams to replace manual incident response, automate end‑to‑end development workflows, and securely integrate LLMs and internal services through declarative orchestration and fine‑grained permission models.
From RPA to AI‑Native Multi‑Agent Teams
RPA scripts automate fixed UI actions but cannot understand business context; any UI change breaks the workflow. Large language models (LLMs) add natural‑language understanding, enabling single agents that can diagnose alerts. A single agent is limited by its context window and tool‑calling capacity, so complex multi‑role scenarios require multiple agents that cooperate under a structured organization, communication policies, shared state, and secure integration of LLMs and tool services.
AI‑Native Concept
AI‑Native extends the cloud‑native principle “applications are born for the cloud” to “systems are born for AI agents”. Instead of retrofitting AI, the architecture treats agents as first‑class citizens with dedicated resources, declarative APIs, and a governance plane.
AgentTeams Overview
AgentTeams (the commercial version of the open‑source project HiClaw) provides a collaboration‑orchestration plane that creates “digital employee squads”. It does not replace the agent runtime (OpenClaw/CoPaw) or the underlying LLM/MCP services; it focuses on coordinating multiple agents as a cohesive organization.
Core Design
Manager : platform‑level controller that creates Teams, Humans, and platform resources. In the commercial product the Manager actions are performed via UI and it never talks directly to Workers.
Team : represents a business department. It contains a TeamAdmin (real human owner), a TeamLeader (special Worker that orchestrates tasks), and a set of Workers (individual agents).
Human : a CRD that models real users with three permission levels (L1 Admin, L2 Team, L3 Worker) and maps them to Matrix rooms and Agent permissions.
Manager (platform control) → TeamAdmin (business owner) → TeamLeader (task splitter) → Workers (executors)Permission Model
Permissions are expressed via spec.admin, peerMentions, and channelPolicy fields. They define which Humans can mention which Workers and which Workers can communicate across Teams, making every communication a policy decision rather than a hard‑coded rule.
Deployment Options
Open‑source (HiClaw) : Worker backend runs in Docker containers or Kubernetes Pods; operations are self‑managed; security isolation relies on the enterprise perimeter; direct internal service access.
Cloud product (AgentTeams) : Worker backend runs on SAE/ECI instances or secure sandboxes; operations are fully managed (zero‑ops); dedicated VPC with zero‑trust gateway; secure network tunnel to internal services; product‑level extensibility.
Both variants share the same controller semantics ( hiclaw‑controller reconcile) under the API version hiclaw.io/v1beta1 with four core Kinds: Manager , Team , Worker , and Human .
Key CRD Definitions
apiVersion: hiclaw.io/v1beta1
kind: Human
metadata:
name: zhangsan
spec:
displayName: 张三
email: [email protected]
permissionLevel: 2 # 1=Admin, 2=Team, 3=Worker
accessibleTeams: [oncall-team]
accessibleWorkers: []The spec.admin field of a Team identifies the Human who acts as the TeamAdmin (business owner). The peerMentions and channelPolicy fields control cross‑Team worker communication and Matrix room invitations, turning “who can talk to whom” into a configurable policy.
AI‑Native Scenarios Implemented
A – Full‑stack product development : From requirement to code, review, test, and release, the agent squad automates the entire pipeline; humans intervene only at key decision points.
B – 24/7 intelligent on‑call center : Alerts are automatically diagnosed, routed, and fix suggestions are generated; humans handle only cases the agents cannot resolve.
C – Open‑source CI pipeline (ChaosBlade) : An issue triggers an agent workflow that analyses the problem, generates a patch, creates a PR, and iterates with reviewers, achieving a fully automated code‑to‑merge loop.
D – Business & community cockpit : Natural‑language queries retrieve operational metrics; agents perform data extraction and answer synthesis.
Each scenario demonstrates automatic routing, DAG‑style orchestration, deep root‑cause analysis, full auditability, human‑in‑the‑loop decision points, and production‑grade engineering outputs.
Example: Alert Handling Workflow
When an alert arrives, the taishan-alert-agent posts an initial diagnosis in the group chat. Within 30 seconds the taishan-diagnosis-agent refines the diagnosis, and after 90 seconds a root‑cause and executable remediation script are posted. Human operators only need to decide whether to apply the script in production, covering roughly 80 % of the incident automatically.
Deployment and Security Architecture
AgentTeams can be deployed via two paths:
Self‑hosted HiClaw: Docker or Kubernetes Pods for Workers, enterprise‑managed control plane, internal network access.
Managed cloud product: SAE/ECI instances for Workers, fully managed control plane, dedicated VPC, zero‑trust gateway, secure network tunnel to internal services.
Security is enforced by an AI gateway (Higress or Alibaba Cloud APIG) that holds the real LLM/MCP credentials. Each Worker receives only a revocable Consumer Token; the gateway applies per‑route allowedConsumers policies to decide which LLM or internal service a Worker may invoke.
Implementation Steps for a Cloud Deployment
Purchase the AgentTeams service (cloud account, VPC, ASI cluster).
In the console, create an instance, bind a model provider, add Humans, define a Team (specifying the TeamLeader model, skills, and Soul), and create Workers under the Team.
Establish a secure network tunnel from the cloud VPC to the internal data center, enabling the AI gateway to reach internal services.
Configure the AI gateway to store upstream API keys (e.g., OpenAI, Claude, MCP tokens) and issue Consumer Tokens to Workers. Define allowedConsumers policies for LLM/MCP access and internal service routes.
Illustrative CI Pipeline (ChaosBlade)
Workflow:
Submit an issue on GitHub.
AgentTeam analyses the issue, proposes a fix, and generates a patch.
Agent creates a PR on a feature branch, runs automated tests, and posts results.
Human reviewer approves the PR; the agent handles DCO signing, amend, and force‑push if needed.
After approval, the PR is merged automatically.
Metrics observed in the demonstration:
13 source‑code locations inspected.
Patch size of +60 lines / –1 line.
Four generated test cases.
Automated handling of DCO failures and force‑push.
Benefits of AI‑Native Multi‑Agent Teams
Scalability : Declarative CRDs and continuous reconciliation make the system as reliable as Kubernetes orchestration.
Governance : Policy‑driven communication and credential management provide fine‑grained control over who can invoke which LLM or internal service.
Reusability : Teams can be declared once and reused across business units; permission levels allow a Human to be a TeamAdmin in one Team and a regular member in another.
Human‑in‑the‑Loop : Humans interact via Matrix rooms, can intervene at decision points, and retain auditability of all actions.
References
HiClaw GitHub repository: https://github.com/agentscope-ai/HiClaw
AgentTeams product documentation: https://help.aliyun.com/zh/document_detail/3040377.html
ChaosBlade PR example: https://github.com/chaosblade-io/chaosblade/pull/1302
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
