Tag

RoCE

1 views collected around this technical thread.

Tencent Tech
Tencent Tech
May 7, 2025 · Artificial Intelligence

How Tencent’s DeepEP Doubles GPU Communication Speed on RoCE Networks

Tencent engineers highlighted a massive speedup in DeepSeek’s open‑source DeepEP communication framework, revealing how their TRMT‑based optimizations—dynamic multi‑QP topology awareness, IBGDA‑driven CPU‑bypass, and atomic signaling—boost RoCE network throughput up to 300% and add another 30% gain when applied to InfiniBand, effectively doubling GPU communication performance for large AI models.

AI model trainingDeepEPGPU communication
0 likes · 8 min read
How Tencent’s DeepEP Doubles GPU Communication Speed on RoCE Networks
Architects' Tech Alliance
Architects' Tech Alliance
Sep 8, 2024 · Artificial Intelligence

Design and Architecture of Multi‑Million GPU Clusters for Large‑Scale AI Model Training

The article surveys the network architectures and congestion‑control techniques used in massive GPU clusters—such as Byte’s megascale, Baidu HPN, Alibaba HPN7, and Tencent Xingmai 2.0—highlighting how high‑bandwidth, low‑latency designs and advanced RDMA technologies enable training of trillion‑parameter multimodal AI models.

AI infrastructureGPU clustersHPN
0 likes · 11 min read
Design and Architecture of Multi‑Million GPU Clusters for Large‑Scale AI Model Training
Architects' Tech Alliance
Architects' Tech Alliance
Aug 18, 2024 · Artificial Intelligence

RDMA, InfiniBand, RoCE, and iWARP: High‑Performance Networking for Large‑Scale Generative AI Model Training

The article explains how RDMA technologies—including InfiniBand, RoCE, and iWARP—provide high‑throughput, low‑latency, CPU‑free data transfer for massive generative AI model training, compares their architectures, and discusses modern network designs and load‑balancing strategies to optimize AI‑focused data‑center networks.

AI trainingHigh Performance ComputingInfiniBand
0 likes · 11 min read
RDMA, InfiniBand, RoCE, and iWARP: High‑Performance Networking for Large‑Scale Generative AI Model Training
Architects' Tech Alliance
Architects' Tech Alliance
Jul 7, 2024 · Operations

Designing High‑Performance Cluster Networks for AI Large Models: InfiniBand vs RoCE

The article analyzes the networking challenges of AI super‑large models, comparing InfiniBand and RoCE technologies, and presents design guidelines for ultra‑scale, high‑bandwidth, low‑latency, and highly stable cluster interconnects to maximize GPU utilization and overall training efficiency.

AIGPU interconnectHigh Performance Computing
0 likes · 14 min read
Designing High‑Performance Cluster Networks for AI Large Models: InfiniBand vs RoCE
Architects' Tech Alliance
Architects' Tech Alliance
Jul 7, 2024 · Operations

Overview of Popular GPU/TPU Cluster Networking Technologies: NVLink, InfiniBand, RoCE, and DDC

This article reviews the main GPU/TPU cluster networking solutions—including NVLink, InfiniBand, RoCE Ethernet, and DDC full‑schedule fabrics—examining their latency, loss‑free transmission, congestion control, cost, scalability, and suitability for large‑scale LLM training workloads.

AI trainingDDCGPU networking
0 likes · 16 min read
Overview of Popular GPU/TPU Cluster Networking Technologies: NVLink, InfiniBand, RoCE, and DDC
Architects' Tech Alliance
Architects' Tech Alliance
May 23, 2024 · Cloud Computing

Design and Comparison of High‑Performance Cloud Data Center Networks for AI Computing

This article analyzes traditional cloud data center network limitations for AI workloads and compares various high‑bandwidth, low‑latency architectures—including two‑layer and three‑layer fat‑tree designs, InfiniBand, and RoCE—providing best‑practice recommendations for building scalable, non‑blocking AI‑Pool networks.

AI computingFat TreeGPU clusters
0 likes · 12 min read
Design and Comparison of High‑Performance Cloud Data Center Networks for AI Computing
360 Smart Cloud
360 Smart Cloud
Apr 25, 2024 · Cloud Native

Building High‑Performance RoCE v2 and InfiniBand Networks in a Cloud‑Native Environment for Large‑Model Training

This article explains how to construct high‑performance RoCE v2 and InfiniBand networks within a cloud‑native Kubernetes environment, detailing the underlying technologies, required components, configuration steps, and performance test results that demonstrate significant communication speed improvements for large‑scale AI model training.

AI trainingCloud NativeInfiniBand
0 likes · 12 min read
Building High‑Performance RoCE v2 and InfiniBand Networks in a Cloud‑Native Environment for Large‑Model Training
Architects' Tech Alliance
Architects' Tech Alliance
Apr 21, 2024 · Fundamentals

Understanding RDMA: InfiniBand, RoCE, and Their Role in High‑Performance AI Model Training

This article explains how Remote Direct Memory Access (RDMA) technologies such as InfiniBand and RoCE bypass OS kernels to achieve ultra‑low latency and high bandwidth, discusses their hardware implementations, cost considerations, and their critical impact on large‑scale AI model training and HPC network design.

AIGPUHigh Performance Computing
0 likes · 11 min read
Understanding RDMA: InfiniBand, RoCE, and Their Role in High‑Performance AI Model Training
Architects' Tech Alliance
Architects' Tech Alliance
Dec 24, 2023 · Artificial Intelligence

Overview of Popular GPU/TPU Cluster Networking Technologies for LLM Training

This article examines the main GPU/TPU cluster networking options—including NVLink, InfiniBand, RoCE Ethernet Fabric, and DDC full‑schedule networks—explaining their latency, loss‑less transmission, congestion control, cost, scalability, and suitability for large‑scale LLM training workloads.

GPU networkingHigh Performance ComputingInfiniBand
0 likes · 18 min read
Overview of Popular GPU/TPU Cluster Networking Technologies for LLM Training
Architects' Tech Alliance
Architects' Tech Alliance
Apr 12, 2023 · Fundamentals

Applying RoCE (RDMA over Converged Ethernet) to High‑Performance Computing: Benefits, Challenges, and Case Studies

This article examines the RoCE protocol—an RDMA‑enabled Ethernet technology—its evolution, technical details, congestion‑control mechanisms, performance comparisons with InfiniBand, practical deployment issues in HPC clusters, and real‑world case studies such as Slingshot and application benchmarks.

EthernetHPCPerformance
0 likes · 19 min read
Applying RoCE (RDMA over Converged Ethernet) to High‑Performance Computing: Benefits, Challenges, and Case Studies
Architects' Tech Alliance
Architects' Tech Alliance
Dec 18, 2022 · Cloud Computing

Hyper‑Converged Data Center Network Architecture and Its Impact on Computational Efficiency

The article explains how hyper‑converged, lossless Ethernet networks integrate storage, high‑performance and general‑purpose compute zones, improve computational efficiency (CE) by reducing latency and power consumption, and outlines emerging technologies such as RoCE, NVMe‑over‑Fabric, PCIe‑free CPU/GPU designs, IPv6 deployment, and AI‑driven traffic management for modern data centers.

NVMe over FabricsRoCEcloud computing
0 likes · 11 min read
Hyper‑Converged Data Center Network Architecture and Its Impact on Computational Efficiency
Architects' Tech Alliance
Architects' Tech Alliance
Oct 9, 2022 · Fundamentals

Understanding InfiniBand and RDMA: Concepts and Configuration Guide

This article provides an overview of InfiniBand and Remote Direct Memory Access (RDMA), explains their underlying protocols and hardware, and offers detailed step‑by‑step guidance for configuring InfiniBand, RDMA, RoCE, and related services on Red Hat Enterprise Linux systems.

InfiniBandLinuxRDMA
0 likes · 9 min read
Understanding InfiniBand and RDMA: Concepts and Configuration Guide
Architects' Tech Alliance
Architects' Tech Alliance
Sep 4, 2022 · Fundamentals

Applying RoCE (RDMA over Converged Ethernet) to High‑Performance Computing: Benefits, Challenges, and Case Studies

This article examines the RoCE protocol and its use in high‑performance computing, describing its low‑latency advantages, congestion‑control mechanisms, performance comparisons with InfiniBand, practical deployment issues, and real‑world case studies such as Slingshot and CESM/GROMACS benchmarks.

EthernetHPCRDMA
0 likes · 18 min read
Applying RoCE (RDMA over Converged Ethernet) to High‑Performance Computing: Benefits, Challenges, and Case Studies
Architects' Tech Alliance
Architects' Tech Alliance
May 19, 2022 · Fundamentals

An Introduction to RDMA: Concepts, Advantages, Protocols, and Programming Basics

This article explains the fundamentals of Remote Direct Memory Access (RDMA), comparing it with traditional networking, outlining its core advantages, suitable use cases, the three main RDMA protocols (Infiniband, RoCE, iWARP), deployment requirements, communication flow, and essential programming concepts.

InfiniBandRDMARoCE
0 likes · 9 min read
An Introduction to RDMA: Concepts, Advantages, Protocols, and Programming Basics
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2022 · Fundamentals

High‑Performance Computing Network Solutions: RoCE v2, RDMA, and InfiniBand Overview

The article explains how high‑performance computing (HPC) networks overcome TCP/IP limitations by using RDMA‑based technologies such as RoCE v1/v2 and InfiniBand, detailing their architectures, advantages, vendor implementations, and cost‑effective migration to Ethernet‑based solutions for GPU‑driven workloads.

HPCHighPerformanceComputingRDMA
0 likes · 7 min read
High‑Performance Computing Network Solutions: RoCE v2, RDMA, and InfiniBand Overview
Architects' Tech Alliance
Architects' Tech Alliance
Sep 27, 2021 · Fundamentals

High‑Performance Computing (HPC) Network Requirements and RDMA Technologies

The article explains how modern data‑center compute demands drive the need for high‑throughput, low‑latency networking, compares TCP/IP with RDMA‑based solutions such as InfiniBand, iWARP and RoCE, and recommends loss‑less Ethernet for large‑scale HPC deployments.

HPCInfiniBandRDMA
0 likes · 10 min read
High‑Performance Computing (HPC) Network Requirements and RDMA Technologies
Architects' Tech Alliance
Architects' Tech Alliance
Sep 9, 2021 · Fundamentals

Understanding DMA and RDMA: Principles, Advantages, and Protocols

This article explains the concepts of Direct Memory Access (DMA) and Remote Direct Memory Access (RDMA), compares traditional data transfer with DMA-enabled paths, outlines RDMA's advantages such as zero-copy and kernel bypass, and reviews the main RDMA protocols, standards bodies, and hardware ecosystem.

DMAHigh Performance ComputingInfiniBand
0 likes · 14 min read
Understanding DMA and RDMA: Principles, Advantages, and Protocols
Architects' Tech Alliance
Architects' Tech Alliance
Mar 7, 2021 · Fundamentals

Understanding RDMA: InfiniBand, iWARP, and RoCE Technologies and Their Differences

This article explains Remote Direct Memory Access (RDMA), its origins in InfiniBand, the Ethernet‑based variants iWARP and RoCE (including RoCEv1 and RoCEv2), compares their architectures, performance characteristics, and deployment requirements for high‑performance computing and data‑center networks.

InfiniBandRDMARoCE
0 likes · 11 min read
Understanding RDMA: InfiniBand, iWARP, and RoCE Technologies and Their Differences