Artificial Intelligence 9 min read

Tencent Cloud's Next-Generation HCC High-Performance Computing Cluster for Large Model Training

Tencent Cloud's new HCC high‑performance computing cluster triples previous generation performance with 3.2 TB/s server bandwidth, Xingsha servers and NVIDIA H800 GPUs delivering up to 1979 TFlops, while its Xingmai 3.2 T ETH RDMA network, TB‑level storage via COS + GooseFS, and multi‑form access (bare metal, cloud servers, containers, functions) enable efficient large‑model training.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud's Next-Generation HCC High-Performance Computing Cluster for Large Model Training

This article introduces Tencent Cloud's newly released next-generation HCC high-performance computing cluster designed for large model training scenarios. The cluster represents a significant advancement in AI computing infrastructure, with performance improvements of three times compared to the previous generation.

Key technical specifications include: server access bandwidth increased from 1.6T to 3.2T, utilizing the latest generation of Tencent Cloud's self-developed Xingsha servers, and equipped with NVIDIA H800 Tensor Core GPUs. The single GPU card can deliver up to 1979 TFlops of computing power.

The article explains why simply stacking GPUs cannot achieve linear growth in computing power, emphasizing the need for coordinated optimization across computing, storage, network, and framework layers. It introduces the self-developed Xingmai high-performance computing network, which provides 3.2T ETH RDMA high-performance networking to significantly reduce communication latency.

The storage architecture is also upgraded with TB-level throughput and million-level IOPS capabilities, using COS+GooseFS and CFS Turbo solutions to provide massive, high-speed, and cost-effective storage for large model training.

The article concludes by discussing how Tencent Cloud provides multi-level access to computing power through various service forms including bare metal, cloud servers, containers, and cloud functions, making high-performance computing more accessible to different types of customers.

High Performance ComputingTencent CloudAI computingGPU ClusterLarge Model TrainingNVIDIA H800Xingmai networkXingsha servers
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.