Tagged articles
4 articles
Page 1 of 1
AgentGuide
AgentGuide
May 3, 2026 · Artificial Intelligence

How to Evaluate an AI Agent Beyond Just Accuracy

Evaluating AI agents requires more than accuracy; you must measure task completion, execution trace, tool usage, latency, cost, error rates, and both explicit and implicit user feedback, using observability, offline smoke‑test and regression suites, and continuous online monitoring to create a closed‑loop improvement process.

AI AgentMetricsObservability
0 likes · 14 min read
How to Evaluate an AI Agent Beyond Just Accuracy
DataFunTalk
DataFunTalk
Sep 30, 2023 · Fundamentals

Different Types of Experiments in Search Scenarios

The presentation by Tencent PCG data product manager Wang Dongxing introduces A/B testing fundamentals and shares practical experiences with various search experiment methods—including regular A/B, vocabulary, diffAB, and interleaving—while highlighting common pitfalls and offering actionable insights for practitioners.

A/B testingData ProductOnline Testing
0 likes · 2 min read
Different Types of Experiments in Search Scenarios
Baidu Intelligent Testing
Baidu Intelligent Testing
Apr 16, 2018 · Operations

Online Load‑Testing Practices for Baidu Nuomi Marketing Activities

This article presents a comprehensive case study of Baidu Nuomi's online load‑testing methodology for high‑traffic marketing events, covering capacity estimation, test planning, execution, anti‑attack measures, platform architecture, and lessons learned to ensure system reliability and performance under peak loads.

Online Testingcapacity planningload-testing
0 likes · 16 min read
Online Load‑Testing Practices for Baidu Nuomi Marketing Activities