Tagged articles
7 articles
Page 1 of 1
AgentGuide
AgentGuide
May 3, 2026 · Artificial Intelligence

How to Evaluate an AI Agent Beyond Just Accuracy

Evaluating AI agents requires more than accuracy; you must measure task completion, execution trace, tool usage, latency, cost, error rates, and both explicit and implicit user feedback, using observability, offline smoke‑test and regression suites, and continuous online monitoring to create a closed‑loop improvement process.

AI agentMetricsObservability
0 likes · 14 min read
How to Evaluate an AI Agent Beyond Just Accuracy
PMTalk Product Manager Community
PMTalk Product Manager Community
Dec 9, 2025 · Product Management

Why AI Product Managers Struggle with Planning: Insights from Real Interviews

The article reveals that many AI product managers can talk about AIGC and agents but stumble when asked to design a rigorous evaluation system, illustrating the problem with a chatbot case study and presenting a detailed 1+3 multi‑dimensional framework to guide product definition, development, and iteration.

AI product evaluationHuman-in-the-LoopOffline Testing
0 likes · 18 min read
Why AI Product Managers Struggle with Planning: Insights from Real Interviews
Ziru Technology
Ziru Technology
Dec 16, 2022 · Big Data

How to Effectively Test Offline Data Metrics and Data Warehouse Pipelines

This article explains what data metrics are, compares offline metric testing with traditional testing, and provides a comprehensive step‑by‑step guide for testing data collection, ETL, warehouse models, metric calculations, scheduling, security, and API outputs in a Hive‑based data warehouse.

Data WarehouseETLHive
0 likes · 9 min read
How to Effectively Test Offline Data Metrics and Data Warehouse Pipelines
DevOps Cloud Academy
DevOps Cloud Academy
Jun 5, 2022 · Operations

Storing Build Dependencies in an Artifact Repository for Stable Builds

The article explains why and how to store external libraries and tools in a private artifact repository, isolate builds from the internet, and verify build stability by testing offline pipelines, emphasizing the importance of managing dependencies for reliable DevOps processes.

DevOpsOffline Testingartifact repository
0 likes · 3 min read
Storing Build Dependencies in an Artifact Repository for Stable Builds
Amap Tech
Amap Tech
May 8, 2020 · Product Management

How Amap’s Evaluation Team Measures Product Success: Roles, Methods, and Tools

This article explains the evolution, responsibilities, and evaluation methods of the product‑effect assessment team at Amap, covering offline testing, AB experiments, metric analysis, automated scoring models, and the tools that support a comprehensive product‑performance framework.

AB testingIndustry InsightsMetrics
0 likes · 13 min read
How Amap’s Evaluation Team Measures Product Success: Roles, Methods, and Tools
360 Quality & Efficiency
360 Quality & Efficiency
Feb 5, 2018 · Artificial Intelligence

Fundamentals of Recommendation Engines: User Profiling, Data Classification, and Testing Methods

The article explains the core concepts of recommendation engines—user profiling and data classification—describes how large‑scale data processing tools are used to build models, and outlines common offline and A/B testing approaches for evaluating recommendation performance.

AB testingMachine LearningOffline Testing
0 likes · 4 min read
Fundamentals of Recommendation Engines: User Profiling, Data Classification, and Testing Methods