Modernizing Tencent Cloud Log Service (CLS): Cloud‑Native Architecture, Challenges, and Benefits
Tencent Cloud Log Service was modernized by migrating over 95 % of its components to a cloud‑native stack of containers, Kubernetes, and declarative APIs, addressing chaotic infrastructure, stateful‑to‑stateless conversion, configuration drift, upgrade risk, elastic scaling, traffic protection and observability, which cut costs by more than 20 million CNY, reduced scaling latency by 90 %, and achieved over 99.99 % availability with petabyte‑scale burst handling.
The digital transformation of an enterprise is essentially a process of breaking internal barriers, which usually involves both technical and organizational reconstruction. This article focuses on the technical side, describing how to achieve application modernization and cloud‑native migration to create higher business value.
Business background and challenges of Tencent Cloud Log Service (CLS)
CLS is a one‑stop, high‑reliability, high‑performance log solution that supports petabyte‑scale data ingestion, collection, storage, retrieval, analysis, processing, and subscription. Rapid growth in log volume (from tens of millions to tens of trillions of records per day) caused performance bottlenecks, unstable architecture, and frequent firefighting, which impacted customer satisfaction and revenue.
Three "representatives" of cloud‑native technology
Cloud‑native technologies (containers, Kubernetes, Serverless, etc.) represent the most advanced production capabilities.
Adopting cloud‑native is essential for product competitiveness and rapid iteration.
Cloud‑native provides cost reduction, efficiency improvement, and resource elasticity.
Challenge 1: Chaotic infrastructure
Legacy physical machines and VMs lead to inconsistent environments, long provisioning cycles, and resource waste. The evolution path from physical servers → virtual machines → containers is described, including rich‑container and sidecar patterns.
Challenge 2: Converting stateful applications to stateless
Stateless services can scale horizontally and survive failures without impact.
Stateful services require complex data synchronization and are harder to scale.
Two common approaches are presented: (1) synchronize state among multiple instances, and (2) externalize state to a centralized storage system.
Challenge 3: Configuration management
Modern cloud‑native applications have scattered configuration across networking, databases, middleware, etc. A unified configuration center, version control, and CI/CD pipelines are required to avoid configuration drift and ensure consistent deployments.
Challenge 4: Smooth architecture upgrade
A seamless upgrade strategy includes canary releases, maintaining old services for a rollback window, and minimizing risk.
Challenge 5: Elastic scaling
Handle traffic spikes with automatic horizontal pod autoscaling (HPA).
Reduce cost by scaling down when load subsides.
Maintain stability by coordinating upstream/downstream scaling and custom metrics.
Challenge 6: Traffic protection and fault tolerance
CLS implements end‑to‑end observability, DNS‑based isolation, rate limiting, and rapid elastic scaling (up to ten thousand cores within minutes) to protect against attacks and failures.
Challenge 7: Observability and development efficiency
Build multi‑layer observability (user, application, middleware, infrastructure).
Automate issue detection, reduce mean time to resolution, and avoid frequent firefighting.
Development efficiency improvements
CI pipeline with >1000 automated test cases ensures compatibility and stability.
Automated release orchestration across dozens of regions reduces manual effort and error rates.
Results of the cloud‑native transformation
The CLS architecture now fully embraces cloud‑native components (containers, Kubernetes, declarative APIs, elastic scaling). After nearly a year of migration, >95% of services are containerized, operational costs are reduced by over 20 million CNY per year, resource usage is cut by more than 10 万 cores, scaling latency is reduced by 90 %, and utilization improves by >40 %. Service availability exceeds 99.99 % with PB‑level burst handling capability.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.