The Evolution of Embedding Techniques: From Word2Vec to Graph Neural Networks
This article traces the development of embedding methods—from the early word2vec model through item2vec, DeepWalk, Node2vec, EGES, HERec, GraphRT, and target‑fitting approaches like DSSM and YouTube recommendation—highlighting how sequence‑construction and target‑fitting paradigms have shaped modern recommendation systems and AI applications.
The author revisits the concept of embedding, emphasizing that while many associate it with the 2013 word2vec model for words, the principle extends to any item that can be represented through relational sequences.
Embedding Origins : Word2vec learns word vectors by maximizing co‑occurrence probabilities within a sliding window, producing a lookup table that serves as a by‑product of supervised learning on sentences.
Sequence‑Construction School : This paradigm treats items (apps, products, tags, etc.) as words in a sequence, using models such as Item2vec, DeepWalk, Node2vec, EGES, and HERec to capture item‑to‑item relationships. It evolves from simple co‑occurrence to graph‑based random walks and heterogeneous information networks, often incorporating side information (e.g., item attributes) to enrich embeddings.
Key papers: DeepWalk (2014), Node2vec (2016), EGES (2018), HERec (2017).
Target‑Fitting School : Here embeddings are learned directly to predict a downstream objective (CTR, click probability, tag relevance). Examples include the Facebook click‑prediction paper, YouTube’s deep recommendation model, DSSM, and dual‑tower architectures for ad and video retrieval. These models jointly optimize user and item embeddings against the final task.
GraphRT (2020) extends this idea by building a heterogeneous graph of users, videos, and tags, applying graph convolution to obtain video embeddings that implicitly yield tag embeddings, thereby merging sequence construction with target fitting.
The article also discusses practical considerations such as the trade‑off between model complexity and deployment efficiency, the importance of embedding scalability for large‑scale retrieval, and the role of side information in mitigating cold‑start problems.
Conclusion : Embedding techniques continue to evolve, driven by the need to model richer item relationships and tighter integration with prediction targets. Future directions may combine heterogeneous graph structures with side‑information weighting, further blurring the line between sequence‑construction and target‑fitting approaches.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.