Artificial Intelligence 19 min read

Contrastive Learning: Foundations, Typical Models, and Applications to Weibo Content Representation

This article explains the concept of contrastive learning, its relationship to self‑supervised and metric learning, describes key system components and loss functions, reviews major image, NLP and multimodal models such as SimCLR, MoCo, SwAV, BYOL, and demonstrates how contrastive learning is applied to Weibo hashtag generation, similar‑post retrieval, and text‑image matching using CD‑TOM and W‑CLIP models.

DataFunTalk

Aug 30, 2021

Contrastive Learning: Foundations, Typical Models, and Applications to Weibo Content Representation

What is Contrastive Learning? Contrastive learning is a self‑supervised paradigm that seeks to pull together representations of positive pairs while pushing apart those of negative pairs, effectively a self‑supervised version of metric learning.

Abstract System An abstract contrastive system first constructs positive and negative samples (often via data augmentations), encodes them with an encoder, and projects the embeddings onto a unit hypersphere. The optimization objective is to minimize distances for positives and maximize them for negatives, typically using the InfoNCE loss.

Key Design Questions

How to construct positives and negatives?

How to design the feature‑mapping function f (encoder + projector)?

How to design the loss function?

Good vs. Bad Systems A bad system suffers from collapse, where all inputs map to the same point on the hypersphere. A good system satisfies Alignment (similar instances have close embeddings) and Uniformity (embeddings are evenly spread), avoiding collapse.

Typical Image Models

SimCLR : uses batch‑wise negatives, a dual‑tower encoder‑projector, and InfoNCE.

MoCo / MoCo V2 / V3 : maintains a large negative queue and employs momentum updates; V3 replaces ResNet with a Transformer.

SwAV : replaces explicit negatives with online clustering (k‑means) to pull together samples in the same cluster.

BYOL and SimSiam : asymmetric architectures that rely only on positives, using a predictor to break symmetry.

Barlow Twins : introduces a redundancy‑reduction loss that works with only positives.

Typical NLP Models

SimCSE : creates positives by feeding the same sentence twice through a BERT encoder with different dropout masks.

Self‑Guided : combines SimCSE ideas with momentum updates.

Typical Multimodal Models

CLIP : aligns image and text embeddings using large supervised image‑text pairs.

WenLan : a multimodal MoCo variant with separate image and text encoders and a negative queue.

Experimental Findings Across benchmarks, BYOL, DeepCluster and SwAV achieve comparable top performance, outperforming earlier methods.

Weibo Applications

CD‑TOM : a multimodal SimCLR‑style model for hashtag generation. Positive pairs are (post text, user‑provided hashtag). After training, all hashtags are embedded and stored in a Faiss index; new posts retrieve the top‑3 most similar hashtags.

Similar‑Post Retrieval : the same CD‑TOM embeddings of posts are indexed; a new post finds its nearest neighbors in embedding space.

W‑CLIP : a dual‑tower model (BERT for text, ResNet for image) that aligns text‑image pairs. After indexing image embeddings, a new post can retrieve a matching image, and vice‑versa.

Experimental results show that despite noisy training data, contrastive learning effectively reduces noise impact and yields high‑quality hashtag suggestions, similar‑post matches, and text‑image retrieval.

Q&A Highlights

Hashtags are matched, not generated, by embedding them and performing nearest‑neighbor search.

Negative‑free methods (SwAV, BYOL) succeed because they avoid treating true positives as negatives; SwAV does this via clustering, BYOL via asymmetric architecture.

SwAV uses online k‑means clustering; accuracy is less critical than speed.

Contrastive learning can be used in supervised settings (e.g., supervised SimCSE) but then it resembles metric learning.

Overall, the talk demonstrates how contrastive learning bridges theory and practice, providing robust representations for large‑scale social media data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI contrastive learning multimodal self-supervised learning representation learning Weibo

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.