Operations 16 min read

Optimizing Release Efficiency and Quality for Tencent Cloud Object Storage (COS)

Tencent Cloud Object Storage (COS) boosted release efficiency and quality by evolving its architecture to YottaStore’s cluster‑percentage changes, introducing Management Zone‑based parallel upgrades, automated gray‑testing and rollbacks, streamlined concurrency controls, visualized flow‑time reductions, standardized quality checks, and a five‑level maturity model aimed at fully automated, reliable deployments.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Optimizing Release Efficiency and Quality for Tencent Cloud Object Storage (COS)

Background

In the ToB era, cloud services demand rapid functional defect fixes, quick rollbacks for problematic releases, and high efficiency without sacrificing quality. Tencent's Object Storage (COS) faces these challenges as its release process is a heavy workload for operations engineers.

1) COS Architecture Evolution

COS consists of three core sub‑platforms: logical access layer, index storage layer, and data storage layer. Each contains dozens of modules, and any link failure can affect PUT/GET/LIST/HEAD APIs. Node count exceeds 100,000.

Traditional storage engines (TFS, LAVADB) required serial changes or data migration, leading to long change windows. YottaStore upgrades to a cluster‑percentage change model, eliminating set‑based migrations and raising the efficiency ceiling.

2) Key Efficiency Measures

2.1 Management Zone (MZ) Adapted Publishing

YottaStore introduces MZ tags: nodes in the same MZ cannot be changed simultaneously, limiting blast radius. Typically 20 MZs are used, ensuring at most 5 % of machines change concurrently.

Before optimization, change rhythm was single‑node gray, random MZ gray, then full MZ serial. After optimization, a whole MZ is upgraded at once, and platform concurrency limits increased from ~100 to >500, achieving full‑MZ parallelism.

2.2 Gray‑Testing Capability

Before a node rejoins production, logs are scanned to confirm successful initialization. Automated rollback is introduced, and test cases are continuously enriched to build a complete testing pipeline.

2.3 Concurrency Strategy

Manual control points are added to the change system, allowing tasks to be confirmed and started directly, dramatically speeding up execution.

2.4 Flow‑Time Optimization

The release workflow is visualized to identify unnecessary delays. A unified pipeline enforces a fixed release order per region, reducing manual approvals and speeding up the overall process.

3) Quality‑Focused Improvements

Quality issues are addressed through:

Instance management and OSS unification.

Configuration template and variable management to eliminate region‑specific inconsistencies.

Standardized release procedures with mandatory checks, double‑approval, and forced rollback.

Metrics and monitoring are introduced to track release success rates, failure causes, and human effort, feeding back into continuous improvement.

4) Maturity Model

COS defines a five‑level maturity model aiming for fully automated releases. Indicators include release efficiency, audit feedback, and defect analysis.

5) Future Directions

Plans include intelligent release modes, automated quality gates, event‑driven rollback, and tighter integration with DevOps change boards to further reduce manual overhead and improve reliability.

devopscloud storagerelease engineeringOperational EfficiencyTencent COS
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.