How We Revamped QQ Browser’s Content Engine: From Micro‑services Chaos to High‑Performance Monolith
This article details the complete redesign of QQ Browser's content ingestion system, explaining why the original micro‑service architecture caused low efficiency and performance, and how a zero‑base redesign using a monolithic service, plugin framework, fault‑tolerant pipelines, and thread separation dramatically improved throughput, latency, and developer productivity.
1. Project Background
The content architecture of QQ Browser Search handles content ingestion and computation across thousands of content types, currently integrating many partners. The existing micro‑service system suffered from low development efficiency and poor performance due to excessive RPC calls, redundant JSON parsing, and string copying.
Low R&D efficiency: Adding a new data type required changes in 3‑4 services, making development cumbersome.
Poor system performance: Data traversed many small services; CPU utilization of core services capped at 40%, and a single message required over 20 JSON parses.
Business teams complained about slow throughput, e.g., processing 600 million documents took 12 days.
2. Overall Design
The new design focuses on five key points:
Monolithic service: Replaces fragmented micro‑services with an in‑memory data flow, reducing RPC overhead.
Plugin system: Introduces a flexible plugin architecture to replace hard‑coded if‑else logic.
Support for incremental and batch (刷库) processing: Custom configurations improve batch performance.
Fault tolerance: Uses Kafka for message buffering and peak‑shaving, ensuring no data loss during failures.
Horizontal scalability: Separates consumption and processing threads, enabling scaling beyond Kafka partition limits.
3. Detailed Design
3.1 From Micro‑services to Monolith
The original system consisted of many tiny services, each handling a specific ingestion path (HTTP, Kafka, DB pull, etc.), resulting in 6 RPC hops per content item. The new monolithic design keeps data in memory, eliminating most RPC calls and simplifying the processing pipeline.
3.2 Plugin‑based Ingestion Flow
Three layers are defined: ingestion, processing, and distribution. Each layer’s functions are implemented as plugins, allowing new content types to be added by configuring plugins rather than writing code.
Examples include batch ingestion tasks and document processing pipelines, both visualized with diagrams.
3.3 Incremental Updates vs. Batch Refresh
Four processing streams are configured: source update, feature update, source batch refresh, and feature batch refresh. This separation removes unnecessary computation during batch jobs, achieving a 10× QPS increase for refresh operations.
3.4 Fault‑tolerant Data Ingestion
All ingestion paths now funnel through Kafka, which buffers messages until they are successfully processed, guaranteeing no data loss even if a node crashes.
3.5 Consumer‑Processor Thread Separation
A lock‑free queue decouples Kafka consumption from document processing, allowing multiple processing threads per partition and improving CPU utilization and horizontal scalability.
4. Diff Verification
A diff verification service aggregates logs from all 15 distribution endpoints, providing unified diff analysis and a recursive JSON comparison tool to handle complex data structures.
5. Code Optimizations
5.1 Less Code
Adopted table‑driven programming to replace verbose if‑else chains and used C++20
std::atomic<std::shared_ptr>instead of double‑buffer designs.
5.2 Higher Performance
Replaced repeated RapidJSON lookups with iterators, eliminated redundant JSON serialization, and introduced Sonic‑JSON, which is 40% faster than RapidJSON.
5.3 Better Foundations
Fixed perceived memory leaks by switching to jemalloc and refined memory pool usage, reducing OOM incidents.
6. R&D Process
6.1 Overall Workflow
Standardized requirement gathering, code review, coding standards, static analysis, CI/CD pipelines, and versioning (MAJOR.MINOR.PATCH).
6.2 Code Review
Mandatory security and style exams, with all changes undergoing rigorous CR checks.
6.3 Documentation
Comprehensive documentation of architecture, operational procedures, and module READMEs ensures knowledge transfer.
6.4 Pipeline Acceleration
Implemented stage‑level locking in BlueShield pipelines and used GitHub mirrors to speed up dependency fetching.
7. Business Impact
7.1 Performance Gains
Processing performance: Single‑core QPS increased from 13 to 172 (13× improvement).
Batch refresh: QPS rose from 1 000 to 10 000 (10×), limited only by storage.
Latency reduction: Average processing time dropped from 2.7 s to 0.8 s (70%+ reduction).
7.2 R&D Efficiency Gains
Lead‑time reduction: Feature development time fell from 5.72 days to 1 day (82% decrease).
Code size reduction: Total lines of code shrank from 113 k to 28 k (75% reduction) due to monolith consolidation, plugin design, and modern C++ usage.
Overall, the redesign delivered a more reliable, scalable, and maintainable content ingestion platform for QQ Browser Search.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.