Backend Development 21 min read

Backend Refactoring and Architecture Design of Tencent Docs Collection Form Service

Tencent Docs transformed its high‑traffic Collection Form by refactoring a monolithic C++‑style service into 19 loosely‑coupled vertical services with light‑heavy separation, database isolation, async Kafka pipelines, and full observability via Tianji, achieving dramatically improved stability, millisecond‑level sync, reliable export, and faster incident resolution.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Backend Refactoring and Architecture Design of Tencent Docs Collection Form Service

Tencent Docs' Collection Form is a core product that drives user growth, especially during major social events. The original backend was a large, monolithic C++‑style tRPC‑Go service, which hindered agile multi‑person development, introduced high release risk, and suffered from severe coupling, performance bottlenecks, and poor observability.

Technical background highlighted three main problems: (1) a massive, non‑standard monolithic service with risky releases; (2) tightly coupled business logic without light‑heavy interface separation, leading to low stability and high latency; (3) lack of strict protocol constraints, causing dirty data from the front‑end and low success rates.

To address these issues, the team performed a complete backend reconstruction. Key design principles included:

Splitting the monolith into vertical service layers (service, logic, repo) using a common scaffolding framework, achieving high cohesion and low coupling.

Implementing light‑heavy separation: core interfaces are kept synchronous, while non‑critical paths are async via message queues.

Adopting a flexible, loosely‑coupled architecture with 19 independent services (e.g., export/archive, new‑list, periodic collection, sync to sheets).

Ensuring storage isolation by vertically partitioning databases per module and separating production from test environments.

Observability was dramatically improved by fully integrating the internal Tianji Platform (天机阁) and adopting an Observability‑Driven Development (ODD) workflow: continuous planning → building → delivery → operation. The system now captures traces, metrics, logs, and profiles, supports intelligent alerts, and provides real‑time dashboards.

Performance enhancements include:

Async processing of data sync via Kafka with cold‑hot queue isolation, reducing partition backlog.

Continuous profiling and time‑based performance analysis to identify and eliminate bottlenecks.

Migration of heavy export tasks to the desktop client, avoiding OOM in the import/export service.

Operational practices were refined with automated CR reviews, single‑flight request de‑duplication, strict code standards, and scheduled data export to ClickHouse for long‑term SLA reporting.

Business outcomes after six months of refactoring:

Significant increase in service stability and availability, supporting million‑scale collections.

Improved user experience with faster sync (seconds instead of minutes) and reliable export.

Enhanced monitoring and faster MTTR thanks to intelligent alerts.

The project demonstrates how systematic backend refactoring, cloud‑native design, and observability can transform a high‑traffic product into a stable, scalable service.

BackendperformanceCloud NativearchitectureObservabilitymicroservices
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.