Artificial Intelligence 15 min read

Intelligent Generation of Search Engine Advertising Keywords: Methods, Frameworks, and Future Directions

This article presents a comprehensive overview of automated techniques for generating high‑quality search engine advertising keywords, covering background, traditional manual methods, intelligent keyword expansion using NLP, segmentation, POS tagging, BILSTM‑CRF, BERT classification, semantic matching with DSSM, and additional approaches such as query suggestion and synonym rewriting.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Intelligent Generation of Search Engine Advertising Keywords: Methods, Frameworks, and Future Directions

The rapid international expansion of Ctrip has led to extensive overseas search‑engine advertising, where effective keyword selection, pricing, and creative design are crucial for improving ROI.

Traditional keyword generation relies on manual brainstorming (expansion) and mining existing search queries (catching), both of which are labor‑intensive and lack fine‑grained targeting.

To address these issues, an intelligent keyword generation pipeline is proposed, consisting of three core modules:

1. Product Information Supply Module – Stores product data (e.g., hotel, flight, city accommodation) and performs cleaning, tokenization, and part‑of‑speech (POS) tagging. Ambiguities in geographic entities are resolved using a Geohash‑based structured dictionary, while insufficient dictionary coverage is mitigated with data augmentation and a BILSTM‑CRF model.

2. Search Habit Summarization Module – Analyzes user search queries to extract common search patterns. It employs named‑entity recognition, tokenization, and POS tagging to map queries such as “Shanghai hotel accommodation” or “Hongqiao Airport inn discount” to structured entities (city, hotel name, demand terms).

3. Keyword Generation Module – Generates candidate keywords from product data and the derived rules, then filters ambiguous terms using three strategies: (a) string‑match overlap, (b) click‑through distribution across multiple products, and (c) semantic similarity scores computed by a DSSM model.

For the “catching” (keyword mining) scenario, a binary classification model fine‑tuned on BERT determines whether a query is accommodation‑related. Subsequent intent recognition treats the problem as semantic matching between query and product, using a two‑stage approach: offline DSSM recall followed by BERT‑based re‑ranking.

Additional explored methods include:

• Query‑suggestion based keyword generation, leveraging popularity, relevance, and diversity of suggested queries.

• Synonym rewriting techniques such as query rewriting, click‑graph based rewriting, and grammatical substitution to expand high‑performing keywords.

The article concludes with future work focusing on extending the system to more languages beyond Chinese, English, Japanese, and Korean, and investigating fully machine‑understood keyword generation that may produce non‑interpretable yet effective terms.

machine learningNLPBERTsemantic matchingSearch AdvertisingBILSTM-CRFkeyword generation
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.