Operations 11 min read

How Tencent SNG Uses Business Profiling to Optimize Capacity, Activity, and Multi‑Region Deployment

This article explains how Tencent's SNG operations team builds and applies business profiling models—including capacity, activity, core‑link, and SET planning—to predict performance, automate scaling, identify critical services, and efficiently distribute workloads across multiple regions.

Efficient Ops

Aug 21, 2018

How Tencent SNG Uses Business Profiling to Optimize Capacity, Activity, and Multi‑Region Deployment

Business Portrait Types

Business portraits consist of several models that address common operational challenges such as performance variations across device types, capacity planning for seasonal traffic spikes, identifying core versus peripheral modules, and multi‑region deployment.

Capacity Model : Automated load testing across different configurations and versions yields TPS thresholds, enabling fine‑grained capacity management based on actual transaction rates rather than raw CPU or memory metrics.

Activity Model : Analyzes historical activity impact on downstream modules, records growth in traffic, CPU, and TPS, and predicts future resource needs for upcoming events.

Core‑Link View : Filters routing, packet capture, and call‑graph data to produce a concise core service dependency graph useful for alarm convergence, root‑cause analysis, and architectural optimization.

SET Planning : Provides visual core‑link maps and multi‑region data‑sync strategies (single‑write, multi‑read) to achieve high‑availability, low‑latency deployments across geographically distributed data centers.

New service launches often undergo performance testing, but how does performance change on different device models or after a version upgrade?

Capacity Portrait : By running automated TPS benchmarks on various device models and versions, thresholds are established. Scaling decisions are triggered only when the aggregated TPS of a module exceeds its threshold, avoiding premature scaling based on misleading CPU usage.

For example, traditional practice expands resources when CPU usage reaches 70 %, yet many services experience timeouts at only 30 % CPU. TPS‑based scaling prevents such false alarms.

During a holiday promotion, user activity was expected to double but only increased by 0.5 ×, leading to wasted pre‑scaled resources. How can we analyze the differing impact on service modules?

Activity Portrait : Each activity’s user growth and its effect on downstream module traffic, CPU, and TPS are recorded. This data feeds a predictive model that estimates capacity changes for future events.

The model quantifies CPU usage as absolute core counts:

CPU_total_core = n∑ A1_core * A1_CPU_average + … + An_core * An_CPU_average

Combining TPS, traffic, and absolute CPU core growth yields a forecast of capacity expansion needed for the next activity. Discrepancies greater than 20 % trigger a secondary analysis to refine the model.

How can we identify core modules among hundreds of service components?

Core‑Link View : Instead of reporting every trace node, we construct a three‑layer call chain (access → logic → storage) using routing, packet capture, and caller‑callee relationships. Filtering by call frequency and packet volume produces a concise core dependency graph.

After filtering, the core link view supports alarm correlation, root‑cause analysis, and real‑time activity dashboards.

SET Planning : For high‑availability, low‑latency services, the SET portrait defines visual core‑link maps and a multi‑region synchronization strategy where writes are centralized (e.g., Shenzhen) and reads are served locally. This design tolerates regional failures while maintaining consistent data.

Metrics such as user capacity, image upload volume, and module‑level capacity are monitored. In a case study, the service is split across Shenzhen, Shanghai, and Tianjin, each handling roughly one‑third of users; any single‑site failure shifts load to the remaining sites.

Regular SET scheduling drills identify bottleneck modules and allow proactive capacity adjustments.

Conclusion

Business portraits transform years of operational experience into reusable models that automate problem solving. The capacity model feeds a platform that auto‑scales hundreds of modules, maintaining roughly 45 % capacity utilization. The activity model predicts module growth during events with about 80 % accuracy, enabling pre‑emptive scaling. Core‑link views streamline complex architecture analysis for alarm convergence and cloud migration. SET planning ensures multi‑region high availability, proven by surviving large‑scale incidents such as a Tianjin data‑center explosion and a Shenzhen fiber cut.

Continual refinement of these portraits is essential as services grow in scale and diversity, encouraging teams to systematically capture operational insights.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations capacity planning multi-region deployment activity modeling core link view

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.