Interest Feeds: From Facebook NewsFeed and EdgeRank to Pinterest Smart Feed and General Techniques
This article explains why interest‑driven feeds are essential, reviews Facebook's NewsFeed evolution and EdgeRank algorithm, details Pinterest's Smart Feed architecture and Pinnability model, and provides a comprehensive guide to building, ranking, and monitoring generic interest‑feed systems for social platforms.
Why Pay Attention to Interest Feeds?
Twitter and Instagram are moving to interest‑based feeds, and recommendation engineers are increasingly focusing on this form as the inevitable direction of the Internet.
Traditional recommendation systems are static and sit in a corner of the UI, while interest feeds become the cash cow of many products.
Facebook NewsFeed
How NewsFeed Started
Launched in 2006, NewsFeed grew into a major revenue source. Early versions relied on simple weight knobs (e.g., photos worth 5 points, group joins 1 point) and later introduced the EdgeRank algorithm.
EdgeRank Algorithm
EdgeRank scores each story by multiplying three factors: Affinity (relationship strength), Edge Weight (action cost), and Time Decay (freshness).
These factors were manually tuned and did not involve machine learning.
Post‑EdgeRank Era
Since 2011 Facebook has shifted to machine‑learning‑driven ranking, using thousands of features, deep neural networks for image and text understanding, and extensive A/B testing.
Pinterest Smart Feed
Pinterest Product Characteristics
Pins, Pinners, Boards, and Interests form the core data model; the home feed mixes pins from followed users, similar content, and followed topics.
Architecture Overview
The backend consists of three modules: Worker (scores pins per user and stores them), Content Generator (selects and orders pins for a request), and Feed Service (merges new and old content for delivery).
Workers store scored pins in HBase pools (unseen and seen) as triples (user, pin, score) .
Pinnability Ranking Model
Pinnability predicts the probability of user interaction using Logistic Regression, SVM, GBDT, and CNN models, trained on thousands of features (pin attributes, user attributes, interaction history).
General Interest Feed Technical Points
Interest feeds reorder content based on relevance rather than chronological order.
Search for "Activity Stream" instead of just "feed" to find technical articles.
Data Model
Three core entities: User, Activity (the content), and Connection (social edges). An Activity follows the Atom model (time, actor, verb, object, target, title, summary).
Feed Publication Strategies
Two main fan‑out models: pull (fan‑out‑on‑load) and push (fan‑out‑on‑write). Hybrid approaches combine both based on user activity levels.
Ranking Algorithms
Formulate feed ranking as a binary classification problem predicting interaction probability; Logistic Regression is a common, simple, and effective choice.
Feature Engineering
Features are grouped into user features, content features, and contextual features; feature selection reduces dimensionality and improves model performance.
Monitoring and Infrastructure
Typical components include Redis for feed storage, Celery for asynchronous tasks, Thrift/RPC for model serving, Vowpal Wabbit for training, and Hadoop/Hive for data pipelines.
Other Practical Considerations
Provide a feed API, collect system metrics (StatsD, Graphite), ensure high availability (redundant HBase clusters), and continuously track model effectiveness (AUC, interaction rate).
Before building an interest feed, verify the business need and be prepared to iterate on architecture and models.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.