Artificial Intelligence 18 min read

Full‑Link Consistency Testing for Click‑Through Rate Models in Large‑Scale Machine Learning

The article describes a comprehensive full‑link consistency testing framework for click‑through‑rate models, defining consistency issues, outlining data and logic consistency goals, and presenting a multi‑stage technical solution—including online data capture, offline data stitching, q‑value comparison, and reporting—to ensure model stability and performance.

Baidu Intelligent Testing

Oct 12, 2021

Full‑Link Consistency Testing for Click‑Through Rate Models in Large‑Scale Machine Learning

This document presents a detailed approach to full‑link consistency testing for click‑through‑rate (CTR) models used in advertising retrieval, emphasizing consistency as the foundation for model stability.

Background and Overview : CTR models estimate the probability of an ad being clicked, influencing ranking and truncation. The offline training pipeline involves feature extraction, model training, evaluation, and online deployment, while inconsistencies can arise across data, latency, policies, and performance, leading to metric fluctuations.

Definition and Goals of Consistency : Consistency issues are defined as mismatches in data (sample and model) and processing logic between offline training and online inference. Goals include detecting inconsistencies, locating their sources, and assessing their impact on system performance.

Technical Solution – Full‑Link Consistency Scheme : The solution breaks down the CTR prediction into multiple stages (parameter parsing, feature extraction, embedding lookup, DNN computation, etc.) and replaces each stage with a controlled offline counterpart to compare q‑values (predicted click probabilities). Five parallel flows (q1‑q5) are generated to isolate discrepancies in feature extraction, model conversion, and DNN computation.

Verification Flow : By comparing q‑values across flows (e.g., q1 vs q2, q3 vs q4), the framework pinpoints whether inconsistencies stem from offline feature extraction, model table conversion, or DNN calculation.

Online Data Acquisition and Debug Logging : Online logs are captured with debug flags, parsed by thread, and formatted to align with offline samples. The logs contain multi‑threaded entries, complete sample information, and may span multiple files.

Offline Data Stitching : Offline samples are matched to online logs using a primary key that appears in both datasets, ensuring a one‑to‑one correspondence for accurate q‑value comparison.

Q‑Value Computation and Reporting : The framework produces statistical reports (distribution, diff ranges) and detailed per‑sample diff information, including feature signatures and primary keys for manual investigation.

End‑to‑End Execution : The process is divided into six stages—traffic capture, log formatting, log stitching, online parsing, q‑value replacement/computation, and report generation—allowing selective execution based on troubleshooting needs.

Task Submission and Monitoring : Users can submit tasks via a platform, providing parameters for data processing and q‑value calculation. The system supports incremental execution, email notifications on failures, and visual dashboards for task status.

Results and Future Work : The solution supports various CTR models, improves debugging efficiency, and has already identified inconsistencies such as feature mismatches and misaligned network structures, guiding subsequent fixes and enhancements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data pipeline machine learning click-through rate DNN model consistency online-offline testing

Written by

Baidu Intelligent Testing

Welcome to follow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.