Backend Development 20 min read

How a Monolith Redesign Boosted Content Ingestion Performance 13‑Fold

The article details how QQ Browser's content architecture was transformed from a fragmented micro‑service system into a single monolithic service with a plugin framework, dramatically improving processing speed, fault tolerance, and development efficiency while handling thousands of content types.

Sanyou's Java Diary
Sanyou's Java Diary
Sanyou's Java Diary
How a Monolith Redesign Boosted Content Ingestion Performance 13‑Fold

Content architecture is the content ingestion and computation layer of QQ Browser search, handling thousands of content types from many partners.

Problems of the old system: low development efficiency (adding a data type required changes in 3‑4 services), poor performance (CPU usage max 40%, many JSON parses), complex fault tolerance, slow iteration, excessive serialization, and difficulty scaling.

To address these, the team rebuilt the system with a zero‑based design, moving from many micro‑services to a single monolithic service, introducing a plugin framework for flexible processing, and separating consumption and computation threads.

Key redesign points

Monolithic service to reduce RPC overhead and keep data in memory.

Plugin system for extensible handling of diverse content types.

Support both incremental updates and batch "刷库" (bulk load) with dedicated processing flows.

Fault‑tolerant design using Kafka for message buffering and peak‑shaving.

Horizontal scaling by decoupling consumption threads from processing threads.

The new architecture replaces numerous if‑else branches with table‑driven logic, uses modern C++20 features (e.g., std::atomic<std::shared_ptr<T>> ), adopts faster JSON libraries (Sonic‑JSON), and integrates jemalloc for better memory handling.

CI/CD improvements include stricter code review, unified coding standards, automated pipelines, and dependency mirroring to speed up builds.

Performance results

Metric

Before

After

Improvement

Single‑core processing QPS

13

172

13×

Batch QPS

13

230

17×

Cluster batch QPS

500‑1000

10000

10×

Average latency

2.7 s

0.8 s

‑71%

p99 latency

17 s

1.9 s

‑88%

CPU utilization

≤40 %

≈100 %

2.5×

R&D efficiency also improved: lead‑time for business requirements dropped from 5.72 days to ≤1 day (‑82 %), code issues eliminated, unit‑test coverage rose to 77 %, and code lines reduced from 113k to 28k (‑75 %).

backendPerformance optimizationmicroservicesPlugin ArchitectureC++system redesign
Sanyou's Java Diary
Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.