Artificial Intelligence 6 min read

Experience Sharing of the 2019 Tencent Advertising Algorithm Competition – Week 1 Champion’s Insights

The week‑1 champion of the 2019 Tencent Advertising Algorithm Competition shares practical experience on data cleaning, feature engineering, model selection (including LightGBM and deep learning), validation strategies, and tips for handling massive ad exposure logs to achieve high SMAPE and monotonicity scores.

Tencent Advertising Technology

May 8, 2019

Experience Sharing of the 2019 Tencent Advertising Algorithm Competition – Week 1 Champion’s Insights

After fierce competition, the 2019 Tencent Advertising Algorithm Competition announced its week‑1 champion, a modest and diligent boy named Chu Can, who now shares his unique insights and practical experience from the contest.

The competition task is to predict future ad exposure on a given day using historical ad information, which is a typical time‑series regression problem. The provided data includes historical exposure logs, user feature files, ad data, and ad operation data, with some dirty data that must be cleaned and no explicit target labels, requiring participants to construct their own labels. Evaluation metrics are SMAPE and a monotonicity score.

Data processing: The historical exposure file is about 100 million rows, so reading it in chunks and converting data types reduces the size to around 2 GB. Because of the large volume, join operations across multiple tables should be performed cautiously. Early in the competition, focus on simple, easily extracted features; more complex features can be added later. Basic features such as ad size and bid are crucial and can yield a solid baseline, while demographic targeting features are more cumbersome and were not used.

Label construction: For each ad ID, multiple bids may appear in a day; the author used the average bid and the count of ad IDs as the exposure target, resulting in roughly 1.5 million training samples. After the official FAQ clarified that the goal is to predict CPC‑type ads, many records could be discarded, reducing the dataset to a few tens of thousands and improving online performance.

Model selection and validation: Participants mainly used traditional tree models, deep learning models, and rule‑based models. For beginners, rule‑based models require strong business understanding and are not recommended. Tree models, especially LightGBM, are fast and support categorical features directly, avoiding heavy one‑hot encoding. Local validation should use data before the 19th as training and the 19th as validation; full‑scale cross‑validation may cause over‑fitting. Since monotonicity accounts for 60 % of the score, post‑processing to enforce monotonicity is essential.

Overall advice: Start with a familiar model and basic features to build a solid framework, then iteratively explore feature importance and more advanced techniques.

Note: The competition registration closes at 12:00 PM Beijing time on May 16, and the preliminary result submission deadline is 12:00 PM on May 23. Teams ranking in the top 20 % will advance to the semifinals, with a maximum of 200 teams.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Advertising time series forecasting Data cleaning LightGBM algorithm competition

Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.