Tagged articles
2 articles
Page 1 of 1
Shi's AI Notebook
Shi's AI Notebook
Apr 23, 2026 · Artificial Intelligence

Decoding Anthropic’s Agent Evaluation Methodology: Challenges, Graders, and Best Practices

Anthropic’s engineering blog outlines a systematic approach to evaluating AI agents, highlighting why agents are harder to test than traditional software, defining key concepts like tasks, trials, transcripts, and outcomes, and detailing the three grader types, evaluation timing, and practical decisions for building robust eval pipelines.

AI agentsLLM-as-judgecapability eval
0 likes · 23 min read
Decoding Anthropic’s Agent Evaluation Methodology: Challenges, Graders, and Best Practices
AI Info Trend
AI Info Trend
Oct 28, 2025 · Industry Insights

2025: The AI Agent Year and a New Standard to End the Evaluation Black Box

In 2025, China’s AI strategy targets over 90% adoption of AI agents, yet enterprises struggle with selection, acceptance, and optimization due to a lack of unified performance metrics, prompting the first national group standard—‘Enterprise‑level AI Agent Application Performance Evaluation Specification’—to provide a comprehensive, multi‑dimensional assessment framework for developers, users, and third‑party evaluators.

AI agentsArtificial IntelligenceIndustry standards
0 likes · 7 min read
2025: The AI Agent Year and a New Standard to End the Evaluation Black Box