Artificial Intelligence 9 min read

Winning Strategies for the Tencent Advertising Algorithm Competition: Text Classification with Word2Vec and BiLSTM

The article details the Tencent Advertising Algorithm competition final, explains the chizhu team's approach of converting ad IDs into word sequences for text classification using large‑scale word2vec embeddings and a dual BiLSTM architecture, presents custom loss functions, training tricks, and shares full Python model code, achieving an overall rank of 11.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
Winning Strategies for the Tencent Advertising Algorithm Competition: Text Classification with Word2Vec and BiLSTM

The 2020 Tencent Advertising Algorithm competition final was held on August 3rd, and the internal chizhu team, which ranked first in the internal leaderboard, shared their solution to inspire other participants.

The competition task is a user profiling problem: given a sequence of ad clicks, predict the user's age (10 classes) and gender (2 classes). The team treated each ID (creative_id, ad_id, adv_id, pro_id) as a token, concatenated them into a sentence, and framed the problem as a text‑classification task.

For text representation they used word2vec embeddings. Experiments showed that a larger window size (60 vs. 10) and higher embedding dimension (300 vs. 128) significantly improved performance, especially because the dataset is large.

The core model consists of two parallel BiLSTM layers with max‑pooling, whose pooled outputs are concatenated and fed to separate classifiers for age and gender. The architecture can be represented as:

creative_id  ad_id  adv_id pro_id
||               ||        ||        ||
emb1      emb2    emb3   emb4
|               |         |        |
|               |
|concat
emb
|
BiLSTM(256) --> maxpool1 --
|                |
concat
BiLSTM(256) --> maxpool2
|
|                              |
age(softmax(10))    gender(softmax(2))

Why not use BERT? The vocabulary for creative_id exceeds 3 million tokens, making pre‑training from scratch prohibitively expensive, and preliminary tests did not yield better results.

Model details: the four ID sequences are embedded (embeddings are frozen), concatenated, passed through two stacked bidirectional LSTM layers, each followed by max‑pooling, and finally classified. The custom loss combines cross‑entropy losses for age and gender equally:

def custom_loss(data1, targets1, data2, targets2):
    ''' Define custom loss function '''
    loss1 = nn.CrossEntropyLoss()(data1, targets1)
    loss2 = nn.CrossEntropyLoss()(data2, targets2)
    return loss1 * 0.5 + loss2 * 0.5

Training tricks that added points included: using larger word2vec windows and dimensions, increasing LSTM hidden size from 128 to 256, random shuffling of sequences each epoch, and ensemble predictions from original and shuffled order.

Using eight models with weighted ensembling, the team achieved a final score of 1.4801, ranking 11th overall. The full PyTorch model code is provided below:

class TXModel(nn.Module):
    def __init__(self, embed_matrix1, embed_matrix2, embed_matrix4, embed_matrix5, embed_matrix6, lstm_hidden_size=256):
        super(TXModel, self).__init__()
        self.embedding1 = nn.Embedding(*embed_matrix1.shape)
        self.embedding1.weight = nn.Parameter(torch.tensor(embed_matrix1, dtype=torch.float32))
        self.embedding1.weight.requires_grad = False
        self.embedding2 = nn.Embedding(*embed_matrix2.shape)
        self.embedding2.weight = nn.Parameter(torch.tensor(embed_matrix2, dtype=torch.float32))
        self.embedding2.weight.requires_grad = False
        self.embedding4 = nn.Embedding(*embed_matrix4.shape)
        self.embedding4.weight = nn.Parameter(torch.tensor(embed_matrix4, dtype=torch.float32))
        self.embedding4.weight.requires_grad = False
        self.embedding5 = nn.Embedding(*embed_matrix5.shape)
        self.embedding5.weight = nn.Parameter(torch.tensor(embed_matrix5, dtype=torch.float32))
        self.embedding5.weight.requires_grad = False
        self.emb_dense2 = nn.Linear(300, 128)
        self.embedding_dropout = nn.Dropout2d(0.2)
        self.lstm = nn.LSTM(300+128*2+300, lstm_hidden_size, bidirectional=True, batch_first=True)
        self.lstm2 = nn.LSTM(lstm_hidden_size * 2, lstm_hidden_size, bidirectional=True, batch_first=True)
        self.dropout = nn.Dropout(0.2)
        self.classifier1 = nn.Linear(lstm_hidden_size*2*2, 2)   # gender
        self.classifier2 = nn.Linear(lstm_hidden_size*2*2, 10)  # age

    def apply_spatial_dropout(self, h_embedding):
        h_embedding = h_embedding.transpose(1, 2).unsqueeze(2)
        h_embedding = self.embedding_dropout(h_embedding).squeeze(2).transpose(1, 2)
        return h_embedding

    def forward(self, x1, x2=None, x3=None, x4=None, x5=None, x6=None, attention_mask=None, head_mask=None):
        h1 = self.apply_spatial_dropout(self.embedding1(x1))
        h2 = self.apply_spatial_dropout(self.embedding2(x2))
        h4 = self.apply_spatial_dropout(self.embedding4(x4))
        h5 = self.apply_spatial_dropout(self.embedding5(x5))
        h2 = self.emb_dense2(h2)
        h5 = self.emb_dense2(h5)
        h = torch.cat([h1, h4, h2, h5], -1)
        h_lstm1, _ = self.lstm(h)
        h_lstm2, _ = self.lstm2(h_lstm1)
        max_pool1, _ = torch.max(h_lstm1, 1)
        max_pool2, _ = torch.max(h_lstm2, 1)
        out = torch.cat((max_pool1, max_pool2), 1)
        out = self.dropout(out)
        out1 = self.classifier1(out)
        out2 = self.classifier2(out)
        return out1, out2

The sharing concludes with gratitude to the chizhu team and an invitation for participants to continue learning from such experiences.

advertisingUser ProfilingcompetitionText ClassificationBiLSTMWord2Vec
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.