Artificial Intelligence 5 min read

Building an Advertising Recommendation Model with Python and PyTorch

This article walks through the development of a simple advertising recommendation system using Python, covering data collection, preprocessing with label encoding, text embedding via Torch, constructing an MLP model, and initiating training, while reflecting on the challenges faced by Python developers in the big‑data era.

Python Programming Learning Circle

Jul 4, 2022

Building an Advertising Recommendation Model with Python and PyTorch

Being a Python developer can feel contradictory: you are immersed in big‑data environments where everyone is "naked swimming" while you also have to analyze and block intrusive ads.

The example starts by gathering a large set of ad‑placement data, including app information, ad slot IDs, media IDs, material details, titles, descriptions, and other vector features.

To handle categorical fields such as pkgname, ver, slotid, mediaid, and material, label encoding is applied to both training and test sets:

for col in ["pkgname", "ver", "slotid", "mediaid", "material"]:
    lbl = LabelEncoder()
    lbl.fit(train_df[col].tolist() + test_df[col].tolist())
    train_df[col] = lbl.transform(train_df[col])
    test_df[col] = lbl.transform(test_df[col])

After encoding, textual features are transformed into vectors using Torch's Embedding layer, converting each categorical value into a dense representation based on the logarithm of its cardinality.

The core model is a multilayer perceptron (MLP) defined with PyTorch. It creates embedding dictionaries for each categorical field, concatenates them with other feature vectors, passes the result through configurable fully‑connected layers, applies ReLU and dropout, and finally outputs a logit:

class MLP(nn.Module):
    def __init__(self, category_dict, layers=[45 + 240, 32], dropout=False):
        super().__init__()
        self.category_dict = category_dict
        self.embedding_dict = {
            key: torch.nn.Embedding(self.category_dict[key] + 1, int(np.log2(self.category_dict[key])))
            for key in category_dict.keys()
        }
        self.fc_layers = torch.nn.ModuleList()
        for _, (in_size, out_size) in enumerate(zip(layers[:-1], layers[1:])):
            self.fc_layers.append(torch.nn.Linear(in_size, out_size).to(device))
        self.output_layer = torch.nn.Linear(layers[-1], 1).to(device)

    def forward(self, feed_dict, embed_dict):
        embedding_feet = {key: self.embedding_dict[key](feed_dict[key]) for key in self.category_dict.keys()}
        x = torch.cat(list(embedding_feet.values()), 1)
        x = torch.cat([x, embed_dict], 1)
        for idx, _ in enumerate(range(len(self.fc_layers))):
            x = self.fc_layers[idx](x)
            x = F.relu(x)
            x = F.dropout(x)
        logit = self.output_layer(x)
        return logit

Training is then launched ("Training starts~"), demonstrating a typical workflow for an ad recommendation pipeline, while acknowledging that more sophisticated techniques may be needed for production‑grade systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning Python recommendation system Embedding PyTorch MLP

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.