Cloud Computing 8 min read

How Tencent’s Cloud Disk Snapshots Enable 6000 SCF Servers in 1 Minute

This article explains how Tencent Cloud’s Serverless Cloud Function (SCF) leverages Cloud Disk Snapshot technology to achieve the concurrent creation of 6000 virtual machines within a minute, detailing the snapshot‑based creation method, system architecture, performance challenges, and the engineering solutions that dramatically improve latency and bandwidth usage.

Tencent Tech
Tencent Tech
Tencent Tech
How Tencent’s Cloud Disk Snapshots Enable 6000 SCF Servers in 1 Minute

Background

Serverless Cloud Function (SCF) is Tencent Cloud’s server‑less execution environment. As SCF usage grows, the underlying CVM servers that create instances face a surge of concurrent creation requests, up to dozens of times normal traffic.

Why Cloud Disk Snapshots Matter

Cloud Disk Snapshot (CBS) provides point‑in‑time copies of cloud disks. SCF teams use CVM servers together with CBS snapshots to batch‑create new instances when releasing a new version.

Two Ways to Create Cloud Servers

Image download : The full image is downloaded to a host, written to a CBS disk, then the server starts.

Snapshot rollback : The image is stored as a snapshot; when a server is needed, the snapshot is rolled back to a CBS disk in seconds, without involving the host.

The following image compares the two methods.

Comparison of image download and snapshot rollback
Comparison of image download and snapshot rollback

Snapshot Rollback Creation Process

Image data is stored in COS object storage.

When a server is created, the snapshot system copies the image from COS to the target CBS disk; the host does not see the data transfer.

Only a small portion of data is needed to start the server; the rest is copied asynchronously, with I/O isolation to avoid impact on user workloads.

If a user accesses data that has not yet been copied, the snapshot system prioritises copying the requested blocks.

System Architecture

The snapshot system consists of three modules: manager, scheduler, and transfer.

Manager handles snapshot task management.

Scheduler dispatches copy tasks to transfer nodes.

Transfer performs the actual data copy from COS to CBS.

When a creation request arrives, the manager forwards it to the scheduler, which assigns transfer nodes to copy the image to the target disks.

Challenges and Solutions

Challenge 1: Long copy time

High concurrency would overload COS reads. Solution: add a local cache in each transfer node so that the same image is downloaded only once.

Challenge 2: Scheduler QPS limit

Creating 6000 servers per minute requires hundreds of thousands of QPS. Solution: make the scheduler horizontally scalable.

Challenge 3: Single‑warehouse write bottleneck

Massive disks in one repository cause bandwidth limits. Solution: introduce a disk‑warehouse scheduling system that spreads disks across multiple warehouses.

Results

With these optimisations, the snapshot system can create 6000 servers within one minute, reduce I/O latency by 95.6 %, and cut COS read bandwidth by 89 %, delivering a faster SCF experience for enterprises and developers.

Scheduler horizontal scaling diagram
Scheduler horizontal scaling diagram
Snapshot system architecture diagram
Snapshot system architecture diagram
Transfer node cache diagram
Transfer node cache diagram
performanceserverlesscloud computingsnapshotScaling
Tencent Tech
Written by

Tencent Tech

Tencent's official tech account. Delivering quality technical content to serve developers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.