Big Data 8 min read

Technical Architecture Overview of Toutiao (Jinri Toutiao) News Platform

The article provides a comprehensive technical overview of Toutiao's growth, data collection, user modeling, recommendation engine, storage solutions, message push system, and its micro‑service and virtualized PaaS architecture, highlighting the massive scale and engineering practices behind the platform.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Technical Architecture Overview of Toutiao (Jinri Toutiao) News Platform

Product Background Toutiao, founded in March 2012, grew from a handful of engineers to over 200 staff, expanding product lines from Neihan Duanzi to news, special sales, and movies, now serving billions of registered users and hundreds of millions of daily active users.

Article Crawling and Analysis The platform crawls roughly 10,000 original news articles daily from various sources, performs manual review of sensitive content, and applies text analysis for classification, tagging, and topic extraction based on region, popularity, and weight.

User Modeling Real‑time logs of user actions are processed using tools such as Scribe, Flume, Kafka, Hadoop, and Storm. User models are stored in MySQL/MongoDB (with read‑write separation) and cached in Memcache/Redis, supporting dimensions like subscriptions, tags, and partial article pushes.

Cold‑Start for New Users New users are profiled through device information, OS version, and social account data (e.g., Weibo), extracting relationships, followers, tags, and installed app details to build an initial interest model.

Recommendation System Toutiao's core recommendation engine includes both automatic and semi‑automatic pipelines: automatic candidate generation, user matching, and push task creation; semi‑automatic selection based on user actions, with over 300 classifiers and continuous model updates.

Data Storage Persistent data is stored in MySQL or MongoDB combined with Memcached/Redis, while images are saved in distributed databases and served via CDN; SSD storage is also explored.

Message Push Push notifications increase user activity (≈20% DAU boost) and are personalized by frequency, content, region, and interests, with metrics such as click‑through rate, uninstall, and disable counts monitored.

System Architecture The article includes several architecture diagrams (omitted here) illustrating the layered design, high‑throughput pipelines, and service separation.

Micro‑service Architecture Toutiao decomposes large applications into smaller services, reusing common layers and focusing on infrastructure to enable rapid iteration, fault tolerance, and independent business team development.

Virtualized PaaS Platform A three‑layer PaaS model unifies SaaS services atop an IaaS layer, abstracting compute resources, integrating public cloud bandwidth, and providing shared services such as logging and monitoring.

Conclusion Key components include data generation & collection, Kafka‑based message bus, ETL pipelines, data warehousing, and query engines (batch, MPP, cube) that together support Toutiao's massive scale and real‑time personalization.

Architecturebig datadata pipelinemicroservicesrecommendation systemtoutiao
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.