Artificial Intelligence 12 min read

User Clustering Techniques in Tencent KanDian: From Traditional Algorithms to N‑gram and action2vec

This article explains how Tencent KanDian analyzes user behavior by introducing the product, describing common clustering scenarios, reviewing traditional unsupervised methods, and detailing advanced path‑based approaches such as N‑gram and action2vec, while discussing their advantages, limitations, and practical applications.

DataFunSummit
DataFunSummit
DataFunSummit
User Clustering Techniques in Tencent KanDian: From Traditional Algorithms to N‑gram and action2vec

The talk begins with an overview of the Tencent KanDian product, highlighting its four main functions—Search, Feed (information flow), Tools, and Novel reading—delivered through the QQ Browser platform. Because the platform offers a rich set of services, understanding user behavior becomes challenging.

Three typical clustering scenarios are presented: KPI fluctuation analysis, fine‑grained growth operations, and product‑market fit studies. These scenarios illustrate why segmenting users is essential for targeted strategy formulation.

Traditional clustering methods such as K‑means, DBSCAN, and hierarchical clustering are introduced, along with their commonly used features (demographic attributes, activity metrics, and consumption preferences). Their strengths and weaknesses are discussed, emphasizing issues like uneven cluster sizes, sensitivity to density parameters, and high memory consumption.

The limitations of conventional methods—coarse user granularity and inability to capture detailed behavior—lead to the exploration of path‑based clustering. User actions are broken down into five elements: action type (exposure, click, duration), scene, content type, content category, and consumption features.

Two advanced techniques are described:

N‑gram clustering: User sessions are transformed into sequences of actions, from which N‑grams (e.g., trigrams) are extracted. A similarity matrix is built by counting N‑gram occurrences across users, followed by distance calculation (e.g., Normalized Polar Distance) and clustering using standard algorithms.

action2vec clustering: Inspired by word2vec, actions are embedded into a low‑dimensional space using a Skip‑gram model. Each user’s path is represented by a weighted sum of action vectors, enabling clustering of large‑scale behavior data.

Practical examples show how N‑gram clustering can identify users who mainly browse the feed or read novels, guiding product adjustments such as shortening transition paths or inserting targeted prompts. Action2vec clustering handles sparse, long‑term user paths, allowing deeper insight into high‑value user groups and informing downstream models through labeled segments.

The Q&A section addresses common concerns: feature weighting (frequency/TF‑IDF), handling long inactivity periods, real‑time updates, and incorporating freshness of events. The overall conclusion highlights that while N‑gram offers interpretability, action2vec provides scalability, and both are valuable tools for data‑driven product optimization.

AITencentBehavior AnalysisN-gramaction2vecuser clustering
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.