User Behavior Clustering in Tencent Kankan: From Traditional Unsupervised Methods to N‑gram and action2vec
This article introduces Tencent Kankan's product landscape and explores various user clustering techniques—including classic unsupervised algorithms, N‑gram based sequence clustering, and deep‑learning driven action2vec—detailing their implementation steps, advantages, limitations, and practical insights for product optimization.
The session begins with an overview of the Tencent Kankan product, highlighting its four core functions—Search, Feed, Tools, and Novel reading—delivered through the QQ Browser platform, and explains why the product’s comprehensive feature set creates challenges for user understanding.
Three typical clustering scenarios are then described: KPI fluctuation analysis, fine‑grained growth operations, and product‑market fit studies, each requiring precise user segmentation.
Common clustering methods such as K‑means, DBSCAN, and hierarchical clustering are introduced, along with their statistical feature bases (demographics, activity metrics, consumption preferences) and their respective strengths and weaknesses.
The limitations of these traditional approaches are discussed, emphasizing coarse granularity, insensitivity to behavior nuances, and difficulty in translating clusters into actionable strategies.
To address these issues, the article presents a user‑path based clustering framework. User actions are decomposed into five elements—action type, scene, content type, content category, and consumption feature—forming a symbolic sequence s = (s1, s2, …, sn) . N‑gram (e.g., 3‑gram) representations are generated from these sequences, and similarity matrices are built using normalized polar distance before applying clustering algorithms.
Subsequently, the deep‑learning approach action2vec is explained. By treating user actions as tokens and predicting subsequent actions (Skip‑gram), each action is embedded into a low‑dimensional vector; weighted combinations of these vectors produce a user‑level representation that can be clustered at scale.
Practical case studies illustrate how N‑gram clustering helps identify users who primarily browse the feed or read novels, while action2vec clustering uncovers finer‑grained patterns such as time‑of‑day content preferences, enabling targeted product adjustments.
The Q&A segment addresses common concerns: handling sparse or delayed user actions, incorporating frequency or freshness weighting, updating clusters periodically, and feeding cluster labels into downstream models for causal analysis and strategy formulation.
Overall, the talk demonstrates how moving from simple statistical clustering to sequence‑aware and embedding‑based methods can improve user segmentation accuracy and drive more effective product decisions.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.