Artificial Intelligence 5 min read

Choosing Mainstream CTR Models: LightGBM, FFM, and Deep Learning Approaches

The author, a graduate student and weekly champion of the Tencent advertising algorithm contest, shares practical guidance on selecting mainstream CTR models—including LightGBM, field‑aware factorization machines, and deep learning approaches—while offering tips on feature handling, hyper‑parameter settings, and resource‑efficient implementation.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
Choosing Mainstream CTR Models: LightGBM, FFM, and Deep Learning Approaches

Greetings, I am Ge Yunpeng, a graduate student from Harbin Institute of Technology Shenzhen, the weekly champion of the Tencent advertising algorithm competition, and I would like to share my insights on choosing mainstream CTR models for this contest.

1. LightGBM (statistical feature model) – This model is robust for numeric features and performs well on conversion‑rate features due to the look‑alike audience expansion being framed as a CTR problem. It is important to mitigate data leakage (e.g., via Bayesian smoothing) and to handle many categorical features; open‑source features from the ByRAN community can add about 0.01 AUC.

2. Field‑aware Factorization Machines (FFM) – A classic CTR model that can be used via the command‑line ../../libffm/ffm-train -l 0.00002 -k 8 -r 0.05 -s 30 -t 25 -p valid.ffm train1.ffm v1_model . The key step is converting categorical features into field+feature_index+value format. Simple cross‑encoding of IDs and occasional use of numeric‑feature discretization can help; set a slightly larger learning rate, train 50‑100 rounds, keep a validation set, and disable auto‑stop to avoid over‑fitting.

3. Deep Learning models – Given the strong sequential information in the feature set, deep models (wide‑and‑deep, DeepFM, PNN, etc.) work well. Recent community contributions such as DeepFFM are worth trying, and an NN model can also be built from an NLP perspective.

Finally, I hope everyone gains more machine‑learning knowledge from the competition, enjoys the process, and competes honestly without cheating.

Small suggestion: Because the provided dataset is large, try to limit the use of memory‑heavy libraries like NumPy and pandas; using native Python lists can greatly reduce memory consumption.

machine learningfeature engineeringctrDeep LearningLightGBMFFM
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.