Tagged articles

7 articles

Page 1 of 1

May 3, 2026 · Artificial Intelligence

How to Evaluate an AI Agent Beyond Just Accuracy

Evaluating AI agents requires more than accuracy; you must measure task completion, execution trace, tool usage, latency, cost, error rates, and both explicit and implicit user feedback, using observability, offline smoke‑test and regression suites, and continuous online monitoring to create a closed‑loop improvement process.

AI agentMetricsObservability

0 likes · 14 min read

How to Evaluate an AI Agent Beyond Just Accuracy

PMTalk Product Manager Community

Dec 9, 2025 · Product Management

Why AI Product Managers Struggle with Planning: Insights from Real Interviews

The article reveals that many AI product managers can talk about AIGC and agents but stumble when asked to design a rigorous evaluation system, illustrating the problem with a chatbot case study and presenting a detailed 1+3 multi‑dimensional framework to guide product definition, development, and iteration.

AI product evaluationHuman-in-the-LoopOffline Testing

0 likes · 18 min read

Why AI Product Managers Struggle with Planning: Insights from Real Interviews

Ziru Technology

Dec 16, 2022 · Big Data

How to Effectively Test Offline Data Metrics and Data Warehouse Pipelines

This article explains what data metrics are, compares offline metric testing with traditional testing, and provides a comprehensive step‑by‑step guide for testing data collection, ETL, warehouse models, metric calculations, scheduling, security, and API outputs in a Hive‑based data warehouse.

Data WarehouseETLHive

0 likes · 9 min read

How to Effectively Test Offline Data Metrics and Data Warehouse Pipelines

DataFunTalk

Oct 22, 2022 · Big Data

Design and Practice of a Risk Control Experiment Platform at Du Xiaoman

This article explains the background, architecture, challenges, and step‑by‑step design of a big‑data‑driven risk control experiment platform used for online and offline strategy testing in internet finance.

Big DataExperiment PlatformFinTech

0 likes · 12 min read

Design and Practice of a Risk Control Experiment Platform at Du Xiaoman

DevOps Cloud Academy

Jun 5, 2022 · Operations

Storing Build Dependencies in an Artifact Repository for Stable Builds

The article explains why and how to store external libraries and tools in a private artifact repository, isolate builds from the internet, and verify build stability by testing offline pipelines, emphasizing the importance of managing dependencies for reliable DevOps processes.

DevOpsOffline Testingartifact repository

0 likes · 3 min read

Storing Build Dependencies in an Artifact Repository for Stable Builds

Amap Tech

May 8, 2020 · Product Management

How Amap’s Evaluation Team Measures Product Success: Roles, Methods, and Tools

This article explains the evolution, responsibilities, and evaluation methods of the product‑effect assessment team at Amap, covering offline testing, AB experiments, metric analysis, automated scoring models, and the tools that support a comprehensive product‑performance framework.

AB testingIndustry InsightsMetrics

0 likes · 13 min read

How Amap’s Evaluation Team Measures Product Success: Roles, Methods, and Tools

360 Quality & Efficiency

Feb 5, 2018 · Artificial Intelligence

Fundamentals of Recommendation Engines: User Profiling, Data Classification, and Testing Methods

The article explains the core concepts of recommendation engines—user profiling and data classification—describes how large‑scale data processing tools are used to build models, and outlines common offline and A/B testing approaches for evaluating recommendation performance.

AB testingMachine LearningOffline Testing

0 likes · 4 min read

Fundamentals of Recommendation Engines: User Profiling, Data Classification, and Testing Methods