Artificial Intelligence 24 min read

Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

We built an LLM‑powered system for Bilibili that automatically creates ad titles from user keywords, employing fluency, style, and quality classifiers, mixed domain data cleaning, and alignment methods such as SFT, DPO and KTO, resulting in a product that now generates about ten percent of daily titles and drives significant ad spend.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Technical Practices and Productization of Intelligent Advertising Title Generation for Bilibili

Background

The rapid development of large language models (LLMs) is reshaping the workflow of advertisers and agencies in creating ad copy. To improve the efficiency of advertisers and the quality of ad titles on Bilibili, we leverage LLM technology and Bilibili commercial data to generate virtually unlimited creative titles from a few user‑provided keywords. The generated titles match Bilibili’s style, enhancing both efficiency and effectiveness.

Technical Practice

2.1 Evaluation Metric Design

To quantify model iteration direction and quality, we built a comprehensive evaluation system consisting of three dimensions: fluency, style score, and quality score. Fluency measures linguistic smoothness; style score assesses similarity to native Bilibili ad titles; quality score reflects whether a title meets general good‑title criteria (keyword relevance, click‑attractiveness, etc.). Each metric is modeled by a separate binary classifier trained on Bilibili data.

Fluency

Goal: Ensure generated titles are linguistically smooth.

Training data: 50k real Bilibili titles (balanced positive/negative). Negative samples are created by corrupting fluent sentences (character replacement, deletion, swapping).

Result: AUC > 0.98 on a 2k‑sample test set.

Style Score

Goal: Capture the subtle style differences between Qwen zero‑shot titles and authentic Bilibili ad titles.

Training data: 25k real Bilibili titles (positive) vs. 25k Qwen‑72B generated titles (negative).

Result: AUC > 0.95.

Quality Score

Goal: Identify high‑click‑rate titles using a pair‑wise labeling approach.

Positive samples: High‑CTR titles annotated by GPT‑4 for reasons (curiosity, emotional resonance, brevity, demand focus).

Negative samples: Titles judged poor by both GPT‑4 and Qwen‑72B.

Training data: ~10k positive/negative pairs.

Result: AUC > 0.88.

2.2 Dataset Construction and Cleaning

We construct a mixed dataset with a ratio of proprietary task data : commercial domain data : open‑domain data = 1 : 1 : N (5 ≤ N ≤ 10). Open‑domain data preserve zero‑shot capability; commercial data (Bilibili titles, video ASR, search queries) strengthen domain generalization. Proprietary task data consist of keyword‑to‑generated‑title and keyword‑to‑original‑title pairs, with strict keyword limits (≤2, ≤3, or unlimited). Cleaning follows the MoDS framework (quality, diversity, necessity filtering) and includes specialized steps such as quantitative digit replacement using Qwen‑72B and manual post‑processing to guarantee fluency.

2.3 Alignment Algorithm Exploration and Optimization

2.3.1 Supervised Fine‑Tuning (SFT)

Initial SFT improves business logic learning. Iterative prompt diversification (multiple prompts for the same task) further enhances robustness and generalization.

2.3.2 Direct Preference Optimization (DPO)

DPO treats preference scores as LLM probability distributions, eliminating the need for a separate reward model. It maximizes the likelihood of preferred completions while minimizing that of dispreferred ones, enabling a simple supervised‑learning‑style alignment.

2.3.3 DPO Variants and Optimizations

Prompt‑density optimization: In multi‑instruction scenarios, increasing information density mitigates rapid convergence and improves performance.

IPO (Inverse Preference Optimization): Adds an L2‑style regularization term to DPO’s loss, yielding modest gains.

KTO (Kinetic Theory of Optimality): Removes the need for paired samples, allowing weighted negative‑sample training and further quality improvements.

KTON: An enhanced KTO variant that strictly constrains negative‑sample behavior, improving quality scores while preserving fluency.

3 Product Entry

Users can input multiple keywords to generate titles; the UI supports repeated generation until satisfaction.

3.2 Title Association Feature

We added an associative‑title function that, based on ANN + Elasticsearch dual‑recall, suggests completions for short queries and semantically similar titles for long queries, dramatically increasing user efficiency.

4 Business‑Side Iterations

After launch, we refined the first‑screen recommendation by injecting account‑linked product entities and leveraging user‑entered queries, raising first‑screen adoption from ~20% to 45%.

4.2 Enhancing Title Novelty

Incorporate community‑generated titles to capture hot trends.

Weekly generation of new titles from recent creations.

RAG‑based meme extraction: Retrieve high‑frequency memes from comments and titles, then use few‑shot prompting to embed them into generated titles.

5 Online Results

The system now contributes to ~10% of daily newly created ad titles on Bilibili, with daily ad spend reaching tens of thousands of yuan, positioning the solution as industry‑leading.

6 Future Plans

Align offline evaluation metrics more closely with online CTR.

Increase generation diversity via temperature tuning and multi‑prompt strategies; explore “thousands‑of‑faces” personalized title generation.

Integrate Retrieval‑Augmented Generation (RAG) and Chain‑of‑Thought (CoT) to mitigate hallucinations and improve timeliness.

Build a commercial‑domain continued pre‑training foundation model and a Bilibili‑specific benchmark covering factual, reasoning, and marketing tasks.

Investigate agent‑driven data engineering to automate task‑specific data pipeline construction.

References

MoDS: Model‑oriented Data Selection for Instruction Tuning (arXiv:2311.15653).

Direct Preference Optimization: Your Language Model is Secretly a Reward Model (arXiv:2305.18290).

RLHF and alternatives: IPO.

KTO: Model Alignment as Prospect Theoretic Optimization (arXiv:2402.01306).

Value‑Incentivized Preference Optimization (arXiv:2405.19320).

GitHub – netease‑youdao/QAnything.

LLMevaluation metricsSFTBilibiliAd Title GenerationAI alignmentDPO
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.