Big Data 7 min read

Handling Large-Scale Data in the Tencent Advertising Algorithm Competition: Model Choices, Data Splitting, and Feature Engineering

The article shares practical strategies for processing massive advertising data in the Tencent algorithm competition, covering model selection between GBDT and neural networks, efficient data partitioning methods for low‑resource environments, and the importance of feature engineering to achieve top rankings.

Tencent Advertising Technology

Apr 26, 2019

Handling Large-Scale Data in the Tencent Advertising Algorithm Competition: Model Choices, Data Splitting, and Feature Engineering

In the 2019 Tencent Advertising Algorithm Competition, contestant Guo Dayi ("Guo Da") shares his experience on how to handle large‑scale data effectively, even on low‑configuration hardware.

Model Selection : The competition mainly uses Gradient Boosting Decision Trees (GBDT) and neural networks. GBDT requires loading all data into memory, while neural networks support streaming training, allowing models to be trained with only a few gigabytes of RAM if a GPU is available. Neural networks excel in time and space efficiency, whereas GBDT offers stable performance and is beginner‑friendly; the best results often come from combining both.

Useful resources mentioned include the open‑source libraries DeepCTR (https://github.com/shenweichen/DeepCTR) and ctrNet-tool (https://github.com/guoday/ctrNet-tool).

Data Partitioning : For GBDT models (e.g., XGBoost, LightGBM, CatBoost), split the dataset into three parts and train separate models if only half of the data fits into memory, then ensemble them. For neural networks (e.g., NFFM, XDeepFM, DIN), divide the data into dozens of chunks stored as pickle files, loading and discarding each chunk sequentially; this approach works with multiple GPUs for parallel training.

In 2018, using only neural networks, Guo Da achieved 7th place, while GBDT models achieved higher accuracy but required much longer training time and massive memory (e.g., 256 GB for full‑data GBDT training). Neural networks can handle thousands of feature dimensions thanks to streaming, outperforming GBDT when feature richness is crucial.

Conclusion : Success in data‑driven competitions ultimately depends on strong feature engineering and a balanced team—one member proficient in neural networks and another in GBDT—to leverage the strengths of both model families.

Participants are encouraged to apply these techniques, experiment with novel features, and register for the upcoming competition.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GBDT neural networks competition Tencent Ads

Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.