Artificial Intelligence 15 min read

Overview of Embedding Methods: From Word2Vec to Item2Vec and Dual‑Tower Models in Recommendation Systems

This article provides a comprehensive overview of embedding techniques, explaining their role in deep learning recommendation systems, detailing Word2Vec and its Skip‑gram model with negative sampling and hierarchical softmax, and extending the discussion to Item2Vec and dual‑tower architectures for item representation.

Sohu Tech Products

May 27, 2020

Overview of Embedding Methods: From Word2Vec to Item2Vec and Dual‑Tower Models in Recommendation Systems

Embedding, often translated as "vectorization" or "vector mapping," is a fundamental operation in deep learning that converts high‑dimensional sparse features into low‑dimensional dense vectors, playing a crucial role in NLP, search ranking, recommendation systems, and CTR models.

In recommendation systems, embeddings serve four main purposes: (1) transforming sparse one‑hot encoded features into dense vectors via embedding layers; (2) providing pretrained feature vectors that are concatenated with other inputs; (3) enabling similarity‑based recall by computing user‑item embedding distances; and (4) acting as real‑time features for ranking models.

The popular starting point for embedding research is Word2Vec. The Skip‑gram model, introduced by Mikolov et al., predicts surrounding context words from a central word and is typically trained with negative sampling or hierarchical softmax to avoid the costly full softmax over large vocabularies.

Negative sampling treats the training objective as a binary logistic regression problem that distinguishes target words from noise words, using a weighted sampling distribution based on word frequencies.

Hierarchical softmax builds a Huffman tree according to word frequencies, placing high‑frequency words near the root to reduce the number of parameters needed for their updates.

Item2Vec extends the Word2Vec idea to recommendation scenarios by treating a user's interacted items as a "sentence" and applying Skip‑gram with negative sampling to learn item embeddings; user embeddings are often obtained by averaging or clustering item vectors.

Dual‑tower (or two‑tower) models, such as DSSM, further generalize Item2Vec by mapping queries and documents (or users and ads) into a shared semantic space using separate deep networks, after which cosine similarity or a softmax layer produces relevance scores.

These dual‑tower architectures have been widely adopted in advertising, search, and recommendation pipelines, allowing efficient offline training of dense item embeddings and lightweight online inference.

The article concludes by noting that while Word2Vec and Item2Vec are foundational, graph‑based embedding methods are emerging to handle the increasingly networked nature of data in modern recommendation systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

skip-gram negative sampling Item2Vec Word2Vec

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.