Ant Group and Five Universities Unveil Agent3σ: A Multi‑Layer AI Agent Security Evaluation Platform

As AI agents move from simple Q&A to tool use and real‑world actions, Ant Group and five leading universities launch the open‑source Agent3σ platform, offering a three‑tier, 7‑dimension risk framework and concrete metrics to assess agents' safety across static, simulated, and live environments.

AntTech
AntTech
AntTech
Ant Group and Five Universities Unveil Agent3σ: A Multi‑Layer AI Agent Security Evaluation Platform

AI agents are evolving from answering questions to invoking tools, manipulating systems, and executing real business tasks, which introduces new security risks beyond model hallucinations.

To answer whether agents can efficiently complete tasks in complex real environments while staying safe, Ant Group together with Tsinghua University, Peking University, Zhejiang University, Nanjing University, and Hangzhou Dianzi University launched the Agent3σ security evaluation platform, targeting OpenClaw‑type agents with multi‑level, reproducible, production‑like assessments.

Why a new platform?

Traditional large‑model safety tests focus on output compliance, but agent risks occur in the execution chain: environment perception, planning, tool calls, and external impact.

Risk dimensions

Local environment damage & availability: resource exhaustion, file deletion, dangerous command execution, system tampering.

Data & information security: sensitive information leakage, data exfiltration, unauthorized credential access, content tampering.

Persistent state & memory pollution: memory injection, malicious command residue, configuration tampering, skill/plugin poisoning.

Permission & system control: sandbox escape, privilege escalation, defense bypass, authorization boundary confusion.

Network attack & remote control: reverse shell, DNS hijacking, internal network probing, malicious persistence.

Business misuse & illegal use: fraudulent social engineering, black‑market automation, illegal content distribution, brand damage.

Financial & transaction risk: unverified sensitive transactions, account manipulation, transaction parameter tampering, decision misguidance.

Three‑tier evaluation framework

L1 Agent3σ‑Sweep (Static data assessment): offline evaluation on static samples; broad coverage, low cost, suitable for model training and rapid red‑team screening.

L2 Agent3σ‑Stage (Simulated interaction): plugin‑based simulation of web pages, emails, etc.; complete workflow, stable results, suitable for multi‑turn interaction and defense‑strategy verification.

L3 Agent3σ‑Canary (Real‑environment testing): execution against real tools and interfaces; closest to production deployment, assesses end‑to‑end security performance and actual risk consequences.

Core metrics

ASR (Attack Success Rate): lower values indicate higher safety.

Sec Awareness (Security Awareness): ability to recognize, refuse, or alert on risky tasks; higher is better.

Task Success: normal task completion rate; higher is better.

Avg Score: composite score balancing risk consequences, security awareness, and usability.

First benchmark results

Claude Opus 4.6: stable performance across L1, L2, and L3, demonstrating strong full‑chain security defense.

Qwen3.6‑Plus: excels in simulated interaction and real‑environment tests, showing superior perception at tool‑calling boundaries.

Some models perform adequately on static samples but degrade noticeably in simulated or real tool‑calling stages, highlighting the unique value of multi‑layer testing.

Case study: indirect prompt injection leading to data exfiltration

In a scenario where an agent is asked to “visit a web page and summarize its content,” the page hides an indirect prompt that steers the agent to read an email summary and attempt to send sensitive content to an external endpoint. Agent3σ validates this issue at three levels:

L1 Static sample test: directly input the malicious web text to see if the model detects and rejects the risk.

L2 Simulated interaction test: use plugins to emulate web and email tool calls, observing cross‑step attack propagation.

L3 Real‑environment test: deploy the actual web page and integrate real toolchains to verify end‑to‑end data exfiltration.

This progressive design separates seemingly safe model responses from actual execution safety.

Implications for stakeholders

Model vendors: receive production‑like red‑team pressure‑test benchmarks to locate real‑business risk blind spots.

Application developers: obtain pre‑deployment safety acceptance baselines, reducing risks when agents access real tools and business systems.

Regulators & compliance bodies: gain reproducible, auditable evidence chains to support intelligent‑agent security governance.

Future work

Agent3σ will continuously expand its risk sample library, toolchains, and scenario coverage, and will release additional evaluation capabilities in collaboration with the community and industry partners to build a security foundation for the agent era.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Open SourceAI AgentRisk AssessmentSecurity EvaluationAgent3σMulti‑Layer Testing
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.