Artificial Intelligence 15 min read

Deep Learning for Multi‑field Categorical Data: Click‑Through Rate Prediction and Model Comparisons

This article presents a deep‑learning‑based approach to multi‑field categorical data, explains FM and FNN embeddings, compares several click‑through‑rate prediction models on Criteo and iPinYou datasets, and demonstrates that factorisation‑machine‑supported neural networks significantly outperform logistic regression and other baselines.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Deep Learning for Multi‑field Categorical Data: Click‑Through Rate Prediction and Model Comparisons

Editor’s note: The material originates from a keynote speech by Dr. Zhang Weinan (UCL) at Ctrip Technology Center’s Deep Learning Meetup, introducing deep learning applications on Multi‑field Categorical datasets.

The talk focuses on applying deep learning to Multi‑field Categorical data, which consist of multiple fields represented by ID values. Such data appear in information‑retrieval tasks like web search, recommendation systems, and ad display. While deep learning excels on continuous and sequential data (images, speech, text), its performance on high‑dimensional sparse categorical data is less explored.

To illustrate, the speaker uses an online advertising click‑through‑rate (CTR) prediction scenario, where user behavior is described by many categorical fields (date, hour, weekday, IP, region, city, browser, ad size, etc.).

The presentation details the advantages of Factorisation Machines (FM) and Factorisation‑machine‑supported Neural Networks (FNN) for handling multi‑value categorical data, and compares them with Logistic Regression (LR), FM, FNN, Convolutional Click Prediction Model (CCPM), and Product‑based Neural Networks (PNN‑I, PNN‑II, PNN‑III).

Current deep‑learning applications are mature in machine vision, speech recognition, and natural language processing, where data are continuous or sequential. These domains benefit from hierarchical feature learning, which is harder for traditional machine‑learning algorithms.

Multi‑field Categorical data differ because each field can take many discrete values (e.g., Weekday=Wednesday, Gender=Male, City=London). Predicting whether a user will click a Disney ad on a news site exemplifies the challenge.

Typical feature representation uses one‑hot binary encoding, producing extremely high‑dimensional sparse vectors (e.g., millions of dimensions), which makes direct neural‑network training infeasible due to the massive number of parameters required.

Embedding these high‑dimensional vectors into a low‑dimensional space reduces model complexity. FM is widely regarded as an effective embedding model.

FM combines a logistic‑regression term with pairwise inner‑product interactions between feature vectors, capturing relationships such as the positive correlation between "Student" and "Shanghai" for the Disney ad click.

The FM‑based neural network can be viewed as a three‑layer architecture:

In the second layer, field embeddings are multiplied (no learnable parameters), drastically reducing dimensionality before feeding into a standard neural network.

Using FM to embed one‑hot encoded inputs yields dense real‑valued vectors, which serve as inputs to the FNN, avoiding the computational burden of sparse binary vectors.

Applying these models to the iPinYou dataset shows that FNN outperforms both LR and FM.

Unlike most neural networks that combine feature vectors additively, FM uses multiplicative (inner‑product) interactions, which correspond to logical AND and capture stricter relationships between features.

Both inner‑product and outer‑product operations can be employed; the outer product reduces to the inner product when only diagonal elements are non‑zero.

The resulting neural network architecture incorporates these product operations as blue nodes:

When the number of fields grows (e.g., 60 fields), the pairwise product matrix becomes very large. By factorising the symmetric weight matrix into a product of two smaller matrices, the number of trainable parameters is dramatically reduced.

Model evaluation was performed on two datasets: the Criteo Terabyte dataset (13 numeric, 26 categorical variables, ~300 GB of data) and the iPinYou dataset (24 categorical variables). The compared algorithms included LR, FM, FNN, CCPM, PNN‑I, PNN‑II, and PNN‑III.

Evaluation metrics were AUC, Log‑loss, RMSE, and Relative Information Gain (RIG), illustrated below:

The results show that PNN models improve AUC by nearly 5 percentage points over LR, confirming the superiority of FM/FNN/PNN over traditional linear models.

Additional experiments on dropout (optimal value ≈ 0.5) and hidden‑layer depth indicated that three hidden layers yield the best trade‑off between capacity and over‑fitting.

On the smaller iPinYou dataset, PNN‑I and PNN‑II consistently outperformed other models, demonstrating robustness.

Node‑distribution experiments revealed that constant and diamond shaped hidden‑layer sizes performed best, while increasing size from bottom to top degraded performance.

Activation‑function tests showed tanh and ReLU significantly outperform sigmoid:

sigmoid(x)=1/(1+e ‑x ) tanh(x)=(1‑e ‑2x )/(1+e ‑2x ) relu(x)=max(0,x)

Conclusion

1. Deep learning can achieve significant performance gains on multi‑field categorical datasets. 2. Inner‑product and outer‑product operations effectively capture feature interactions. 3. In ad‑click prediction, PNN models outperform other baselines.

For reuse, please contact [email protected] (replace # with @). The presentation PPT is available via the original article link.

advertisingdeep learningneural networksclick-through rate predictionfactorisation machinemulti-field categorical
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.