Backend Development 10 min read

Design and Analysis of a Weibo Feed System: Storage, Push vs. Pull Models, and Scaling Strategies

This article examines the architectural design of a Weibo-like feed system, covering storage schema, the trade‑offs between push (write‑amplification) and pull (read‑amplification) delivery models, caching techniques, and a hybrid approach for handling both small‑scale and large‑scale follower scenarios.

Architecture Digest
Architecture Digest
Architecture Digest
Design and Analysis of a Weibo Feed System: Storage, Push vs. Pull Models, and Scaling Strategies

Weibo, WeChat Moments, Douyin and similar applications are typical feed‑driven products where users consume content posted by others; this article analyses the design of a Weibo feed system and discusses possible improvements.

1. Storage design – The system is divided into three main tables:

Feed storage (permanent):

create table `t_feed`(
  `feedId` bigint not null PRIMARY KEY,
  `userId` bigint not null COMMENT '创建人ID',
  `content` text,
  `recordStatus` tinyint not null default 0 comment '记录状态'
)ENGINE=InnoDB;

Follow relationship (permanent):

CREATE TABLE `t_like`(
    `id` int(11) NOT NULL PRIMARY KEY,
    `userId` int(11) NOT NULL,
    `likerId` int(11) NOT NULL,
    KEY `userId` (`userId`),
    KEY `likerId` (`likerId`)
)ENGINE=InnoDB;

Inbox (temporary, acts as a mailbox for each user):

create table `t_inbox`(
  `id` bigint not null PRIMARY KEY,
  `userId` bigint not null comment '收件人ID',
  `feedId` bigint not null comment '内容ID',
  `createTime` datetime not null
)ENGINE=InnoDB;

2. Scenario characteristics – The workload is read‑heavy (many more reads than writes) and requires ordered display based on timestamps or ranking scores.

3. Push (write‑amplification) model – When a user posts a feed, the system pushes the content to all followers' inboxes:

/** 插入一条feed数据 **/
insert into t_feed (`feedId`,`userId`,`content`,`createTime`) values (10001,4,'内容','2021-10-31 17:00:00');
/** 查询所有粉丝 **/
select userId from t_like where liker = 4;
/** 将feed插入粉丝的收件箱中 **/
insert into t_inbox (`userId`,`feedId`,`createTime`) values (1,10001,'2021-10-31 17:00:00');
insert into t_inbox (`userId`,`feedId`,`createTime`) values (2,10001,'2021-10-31 17:00:00');
insert into t_inbox (`userId`,`feedId`,`createTime`) values (3,10001,'2021-10-31 17:00:00');

Problems: poor real‑time performance for popular users, high storage cost due to data duplication, and difficulty keeping data state (deletions, unfollows) consistent.

Suggested mitigations include using a message queue for asynchronous insertion, employing high‑performance databases, and separating hot and cold data with periodic cleanup.

4. Pull (read‑amplification) model – Users retrieve feeds by pulling content from the sources they follow:

/* Get all followed user IDs */
select liker from t_like where userId = 1;
/* Pull feeds of those users */
select * from t_feed where userId in (4,5,6) and recordStatus = 0;

After fetching, the results are merged and sorted by timeline. This eliminates write‑amplification but introduces heavy read pressure, especially when a user follows many accounts.

To reduce latency, a caching layer is introduced:

Cache the follow‑list per user (key: userId, value: set of followed IDs).

Cache each author’s recent feeds (key: authorId, value: feed list).

When a user requests a feed, retrieve the relevant cached entries from multiple shard nodes, merge, and sort.

Even with caching, read pressure can be high; horizontal scaling of cache shards (e.g., three‑master‑three‑slave) is recommended.

5. Hybrid approach – Define a follower‑count threshold. Below the threshold, use the push model (low duplication, good immediacy). Above the threshold, switch to the pull model with cached hot data, while still optionally pushing to a small “active‑fan” subset for real‑time updates.

In summary, push mode suits small‑fan‑count scenarios (e.g., personal moments), pull mode fits large‑scale users (e.g., Weibo celebrities), and a combined strategy can balance immediacy, storage cost, and performance.

backendscalabilitycachingdatabase designfeedpush modelPull Model
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.