How Tencent’s Cloud Disk Snapshots Enable 6000 SCF Servers in 1 Minute
This article explains how Tencent Cloud’s Serverless Cloud Function (SCF) leverages Cloud Disk Snapshot technology to achieve the concurrent creation of 6000 virtual machines within a minute, detailing the snapshot‑based creation method, system architecture, performance challenges, and the engineering solutions that dramatically improve latency and bandwidth usage.
Background
Serverless Cloud Function (SCF) is Tencent Cloud’s server‑less execution environment. As SCF usage grows, the underlying CVM servers that create instances face a surge of concurrent creation requests, up to dozens of times normal traffic.
Why Cloud Disk Snapshots Matter
Cloud Disk Snapshot (CBS) provides point‑in‑time copies of cloud disks. SCF teams use CVM servers together with CBS snapshots to batch‑create new instances when releasing a new version.
Two Ways to Create Cloud Servers
Image download : The full image is downloaded to a host, written to a CBS disk, then the server starts.
Snapshot rollback : The image is stored as a snapshot; when a server is needed, the snapshot is rolled back to a CBS disk in seconds, without involving the host.
The following image compares the two methods.
Snapshot Rollback Creation Process
Image data is stored in COS object storage.
When a server is created, the snapshot system copies the image from COS to the target CBS disk; the host does not see the data transfer.
Only a small portion of data is needed to start the server; the rest is copied asynchronously, with I/O isolation to avoid impact on user workloads.
If a user accesses data that has not yet been copied, the snapshot system prioritises copying the requested blocks.
System Architecture
The snapshot system consists of three modules: manager, scheduler, and transfer.
Manager handles snapshot task management.
Scheduler dispatches copy tasks to transfer nodes.
Transfer performs the actual data copy from COS to CBS.
When a creation request arrives, the manager forwards it to the scheduler, which assigns transfer nodes to copy the image to the target disks.
Challenges and Solutions
Challenge 1: Long copy time
High concurrency would overload COS reads. Solution: add a local cache in each transfer node so that the same image is downloaded only once.
Challenge 2: Scheduler QPS limit
Creating 6000 servers per minute requires hundreds of thousands of QPS. Solution: make the scheduler horizontally scalable.
Challenge 3: Single‑warehouse write bottleneck
Massive disks in one repository cause bandwidth limits. Solution: introduce a disk‑warehouse scheduling system that spreads disks across multiple warehouses.
Results
With these optimisations, the snapshot system can create 6000 servers within one minute, reduce I/O latency by 95.6 %, and cut COS read bandwidth by 89 %, delivering a faster SCF experience for enterprises and developers.
Tencent Tech
Tencent's official tech account. Delivering quality technical content to serve developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.