Artificial Intelligence 14 min read

HetMatch: Heterogeneous Graph Neural Network for Keyword Recommendation in Search Advertising

HetMatch is a heterogeneous graph neural network for keyword recommendation in search advertising that tackles cold‑start and large‑scale challenges by hierarchically fusing node and subgraph features, denoising graph convolutions, applying self‑attention, twin matching, and multi‑view learning, delivering notable recall gains and online performance improvements for Alibaba’s advertising tools.

Alimama Tech

Dec 22, 2021

HetMatch: Heterogeneous Graph Neural Network for Keyword Recommendation in Search Advertising

Abstract Recent years have seen a surge of interest in online advertising optimization. In search advertising, keyword recommendation is a core service for advertisers. This article introduces HetMatch, a heterogeneous‑graph‑learning based keyword recall model developed by Alibaba’s Customer Growth team, presented at CIKM 2021. The model addresses challenges in large‑scale keyword recommendation and has been deployed across multiple Alibaba advertising tools.

Background Search ads rely on advertisers bidding on keywords to obtain exposure. Millions of advertisers manually add tens of millions of keywords daily, yet many lack expertise to select effective keywords, leading to low exposure rates (less than 10% of self‑selected keywords receive impressions the next day). Existing recall methods based on text matching, collaborative filtering, or topic clustering ignore rich heterogeneous behaviors and suffer from cold‑start issues.

Problem Definition A heterogeneous information network (HIN) is constructed from ads (ad), items (item), and queries (query), with nodes and multiple relation types (click, coclick, etc.). The goal is to maximize the overall top‑K recall rate for each ad while restricting recalled keywords to a candidate set sharing the same predicted category.

Method HetMatch follows a hierarchical information‑fusion pipeline:

1. Node‑level feature fusion : Encode discrete and continuous features of each node into fixed‑dimensional vectors, converting continuous features via quantile binning and applying type‑specific neural networks to obtain node embeddings.

2. Subgraph‑level feature fusion : Define two groups of metapaths—one based on purchase relationships and another on item‑bridge relationships—to capture high‑order semantic neighbors. Metapaths model competition among ads for the same keyword and co‑click patterns between ads and items.

3. Denoising graph convolution : Extend GraphSAGE with an auto‑encoder‑based aggregation that compresses neighbor information, reducing noise from random clicks and under‑trained node representations. Top‑K sampling based on actual click behavior further mitigates noisy edges.

4. Semantic fusion layer : Aggregate embeddings from different metapaths using a self‑attention layer (as in HAN) to produce a unified representation.

5. Twin matching : Transform the ad‑keyword matching problem into a meta‑node matching problem, aligning ad embeddings with the mean embedding of its top‑K related keywords (and vice‑versa) to ensure homogeneous representation spaces.

6. Multi‑view learning : Incorporate multiple ad‑keyword relation views (click, purchase, item‑bridge) with separate view‑specific networks for the ad side while sharing a single keyword embedding. Optimization uses sampled softmax loss to maximize positive pair scores and minimize irrelevant pairs.

Offline Experiments The HetMatch model was evaluated on a large‑scale search‑ad production dataset against baselines such as term‑match, DSSM, HAN, and IntentGC. HetMatch consistently improved recall across various recall depths and showed notable gains in cold‑start scenarios. Ablation studies confirmed the contribution of each module.

Online A/B tests on Alibaba’s “直通车” keyword recommendation tools demonstrated a 4.19% increase in adoption rate for the keyword suggestion tool, a 5.35% rise in click volume for adopted keywords, and a 10.89% boost in spend for the smart‑buy tool compared to the previous GraphSAGE‑based model.

Conclusion HetMatch leverages a massive heterogeneous graph of ads, items, and queries to enhance keyword recall. Future work includes scaling to even larger graphs with richer node types, and integrating transformer‑based language models with GNNs to improve textual modeling.

References

[1] Wu, S., Sun, F., Zhang, W., & Cui, B. “Graph neural networks in recommender systems: a survey.” arXiv preprint arXiv:2011.02260 (2020).

[2] Xu, J., Yang, Y., Wang, C., Liu, Z., Zhang, J., Chen, L., & Lu, J. “Robust Network Enhancement from Flawed Networks.” IEEE Transactions on Knowledge and Data Engineering (2020).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Recommendation Systems cold start search advertising heterogeneous graph neural network keyword recommendation

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.