Backend Development 9 min read

Understanding Feed Stream Architecture: Models, Storage, and Optimization

This article explains the concept of feed streams, compares push and pull implementation models, discusses storage options such as MySQL, Redis SortedSet and Cassandra, and presents optimization techniques including online‑push/offline‑pull strategies, pagination methods, and deep‑paging solutions for large‑scale systems.

Architect
Architect
Architect
Understanding Feed Stream Architecture: Models, Storage, and Optimization

Feed streams are a common UI pattern in social and information applications, originating from RSS aggregation, where users subscribe to multiple sources and receive a combined list of items.

The core of a feed stream is a many‑to‑many relationship between M users and N information sources; the system must aggregate and order the resulting feed items for each user.

Two basic implementation modes exist:

Push mode – when a new feed is published, it is inserted into the feed streams of all followers.

Pull mode – when a user requests their feed, the system traverses the follow graph and aggregates the latest items in real time.

Push offers low latency for reads but can cause peak load spikes for high‑fan‑count users, often requiring asynchronous processing via message queues. Pull reduces write spikes but may increase read latency.

In practice a hybrid push‑pull approach is used, and the storage design is crucial. Three data groups need persistence:

Author‑published feed list – typically stored in a relational database such as MySQL, or an ordered NoSQL store.

User‑author follow relationships – also stored in MySQL or a KV‑style NoSQL database.

User feed streams – can be generated on‑the‑fly and need not be persisted.

The lightweight solution is to cache feeds in Redis, using a SortedSet where the feed ID is the member and the timestamp or score determines order; this structure guarantees uniqueness, thread‑safe insertion, and easy implementation of both push and pull modes.

To limit memory usage, a TTL is set on cached feeds. Persistent storage can be layered: active users’ feeds in Redis, monthly active users’ feeds in a database, and long‑inactive users’ feeds rebuilt from MySQL follow data when they log in again.

For large‑scale systems, Cassandra is recommended for persistent feed storage: the user UID serves as the partition key, while timestamp and feed ID form the clustering key, providing ordered, deduplicated storage with built‑in TTL support, which aligns well with the append‑only nature of feeds.

Optimization techniques include:

Online‑push/offline‑pull – push new items only to online followers; offline followers receive feeds via pull when they log in, using a “small‑table drives large‑table” join strategy to compute the intersection of followers and online users.

Pagination – avoid duplicate items when new feeds shift page boundaries by using a cursor‑based approach (last feed ID + limit) and leveraging Redis ZRangeByScore with timestamps as scores.

Deep pagination – cache recent data (e.g., one month) in Redis, progressively pull older data into cache as users scroll, reducing TTL for older segments to free memory.

The article concludes with a reminder to share the content and contact the author for further technical discussions.

backend architecturesystem optimizationRedispaginationfeed streamCassandrapush pull model
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.