Design and Analysis of a Weibo Feed System: Storage, Push vs. Pull Models, and Scaling Strategies
This article examines the architectural design of a Weibo-like feed system, covering storage schema, the trade‑offs between push (write‑amplification) and pull (read‑amplification) delivery models, caching techniques, and a hybrid approach for handling both small‑scale and large‑scale follower scenarios.
Weibo, WeChat Moments, Douyin and similar applications are typical feed‑driven products where users consume content posted by others; this article analyses the design of a Weibo feed system and discusses possible improvements.
1. Storage design – The system is divided into three main tables:
Feed storage (permanent):
create table `t_feed`(
`feedId` bigint not null PRIMARY KEY,
`userId` bigint not null COMMENT '创建人ID',
`content` text,
`recordStatus` tinyint not null default 0 comment '记录状态'
)ENGINE=InnoDB;Follow relationship (permanent):
CREATE TABLE `t_like`(
`id` int(11) NOT NULL PRIMARY KEY,
`userId` int(11) NOT NULL,
`likerId` int(11) NOT NULL,
KEY `userId` (`userId`),
KEY `likerId` (`likerId`)
)ENGINE=InnoDB;Inbox (temporary, acts as a mailbox for each user):
create table `t_inbox`(
`id` bigint not null PRIMARY KEY,
`userId` bigint not null comment '收件人ID',
`feedId` bigint not null comment '内容ID',
`createTime` datetime not null
)ENGINE=InnoDB;2. Scenario characteristics – The workload is read‑heavy (many more reads than writes) and requires ordered display based on timestamps or ranking scores.
3. Push (write‑amplification) model – When a user posts a feed, the system pushes the content to all followers' inboxes:
/** 插入一条feed数据 **/
insert into t_feed (`feedId`,`userId`,`content`,`createTime`) values (10001,4,'内容','2021-10-31 17:00:00');
/** 查询所有粉丝 **/
select userId from t_like where liker = 4;
/** 将feed插入粉丝的收件箱中 **/
insert into t_inbox (`userId`,`feedId`,`createTime`) values (1,10001,'2021-10-31 17:00:00');
insert into t_inbox (`userId`,`feedId`,`createTime`) values (2,10001,'2021-10-31 17:00:00');
insert into t_inbox (`userId`,`feedId`,`createTime`) values (3,10001,'2021-10-31 17:00:00');Problems: poor real‑time performance for popular users, high storage cost due to data duplication, and difficulty keeping data state (deletions, unfollows) consistent.
Suggested mitigations include using a message queue for asynchronous insertion, employing high‑performance databases, and separating hot and cold data with periodic cleanup.
4. Pull (read‑amplification) model – Users retrieve feeds by pulling content from the sources they follow:
/* Get all followed user IDs */
select liker from t_like where userId = 1;
/* Pull feeds of those users */
select * from t_feed where userId in (4,5,6) and recordStatus = 0;After fetching, the results are merged and sorted by timeline. This eliminates write‑amplification but introduces heavy read pressure, especially when a user follows many accounts.
To reduce latency, a caching layer is introduced:
Cache the follow‑list per user (key: userId, value: set of followed IDs).
Cache each author’s recent feeds (key: authorId, value: feed list).
When a user requests a feed, retrieve the relevant cached entries from multiple shard nodes, merge, and sort.
Even with caching, read pressure can be high; horizontal scaling of cache shards (e.g., three‑master‑three‑slave) is recommended.
5. Hybrid approach – Define a follower‑count threshold. Below the threshold, use the push model (low duplication, good immediacy). Above the threshold, switch to the pull model with cached hot data, while still optionally pushing to a small “active‑fan” subset for real‑time updates.
In summary, push mode suits small‑fan‑count scenarios (e.g., personal moments), pull mode fits large‑scale users (e.g., Weibo celebrities), and a combined strategy can balance immediacy, storage cost, and performance.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.