Artificial Intelligence 11 min read

Design and Implementation of a Real-Time Advertising Feature Platform for CTR Prediction at Bilibili

To eliminate data fragmentation, feature inconsistencies, and multi‑language implementation challenges, Bilibili built a unified real‑time advertising feature platform that aligns offline, hourly, and online pipelines via a shared C++ library and JNI, boosting CTR prediction accuracy, cutting training costs, and increasing ad revenue by over 1 %.

Bilibili Tech

Jun 27, 2023

Design and Implementation of a Real-Time Advertising Feature Platform for CTR Prediction at Bilibili

Bilibili, a video platform with a massive young user base and diverse content, provides advertisers with rich placement scenarios. To achieve more precise ad delivery, the commercial technology team extracts multi‑dimensional data features (user, material, scene, etc.) to build fine‑grained audience portraits. After feature calculation, these data become training samples for deep models that estimate click‑through rate (CTR) and conversion rate (CVR) of ad creatives. During online serving, the CTR estimation service uses these deep models to score each candidate ad, and the scores guide the selection of the highest‑value ads.

The existing system faced several issues:

Data sources are scattered, leading to high stitching costs.

Training and inference data are generated by different pipelines, causing inconsistencies.

Feature calculation is implemented in three languages (Python, Java, C++), making it hard to keep logic identical.

Debugging consistency between training and inference is costly because inference‑stage features are not fully available offline.

To address these problems, a unified feature platform was designed, covering both model training and inference. The platform simplifies data stitching by reporting feature data from the online service side, ensuring that the same feature set is used for both training and inference.

Model Training

Training is divided into offline (daily) and real‑time (hourly) pipelines.

Offline training processes the previous day's logs, performing three stages: data stitching (merging ad‑engine logs, user‑behavior logs, Hive tables, and Redis data into an offline Hive view stored on HDFS), sample generation (MapReduce reads HDFS files, Python scripts compute features, and samples are saved back to HDFS), and model training (the training framework reads the samples from HDFS).

Real‑time training processes the last hour’s logs with a smaller data volume but higher freshness. It also follows three stages: data stitching (logs are merged with offline data and written to a new message queue as a real‑time view), sample generation (Flink reads the queue, Java UDFs compute features, and samples are stored on HDFS), and model training (same as offline).

Model Inference

The online CTR estimation service scores candidate ads using deep models. Its feature processing pipeline mirrors the training pipelines: data stitching (combining request‑side data with ad‑side data from the retrieval engine and online Redis), sample generation (C++ feature operators produce inference samples), and model inference (deep model outputs CTR/CVR estimates).

The new architecture introduces several upgrades:

Feature reporting service on the online side sends both request‑side and winning‑ad data to a unified feature log, ensuring identical feature sets for training and inference.

Enhanced C++ feature library with serialization/deserialization and reduced third‑party dependencies, lowering JNI overhead.

Unified feature calculation via JNI: Flink UDFs call the C++ library, guaranteeing consistent logic across offline, real‑time, and online stages.

Practice and Benefits

Since launch, three deep models have been retrained using the new platform, achieving notable gains:

72% of feature inconsistency issues were resolved (including 9% severe inconsistencies with >10% diff), improving model accuracy.

Overall ad revenue increased by 1.30%; click‑through rates rose by 4.61% (information flow), 1.36% (Story), and 2.42% (play page).

Data stitching workload was dramatically reduced, cutting Flink task concurrency by 79% and simplifying the training pipeline.

Conclusion and Outlook

Through close collaboration among engineering, data, and algorithm teams, a real‑time advertising feature platform was built, overcoming data fragmentation, inconsistency, and debugging challenges. The platform provides a unified data processing and feature calculation solution for both offline and real‑time training, enhancing model prediction accuracy while lowering training costs. Future work will focus on further performance improvements, continued optimization of training and inference workflows, and supporting sustained growth of Bilibili’s commercial business.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Advertising Flink deep learning CTR prediction online training feature platform offline training

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.