Tagged articles

24 articles

Page 1 of 1

Dec 7, 2025 · Fundamentals

CUDA Optimization Basics: Understanding GPU Architecture and Warp Scheduling

This article explains the fundamentals of CUDA performance tuning, covering GPU architectures from Kepler to Volta, the role of SMX, warp schedulers, registers and memory hierarchies, and provides practical guidance on launch configuration, latency hiding, and thread‑block sizing to maximize throughput.

CUDAGPU architecturePerformance Optimization

0 likes · 21 min read

CUDA Optimization Basics: Understanding GPU Architecture and Warp Scheduling

Architects' Tech Alliance

Nov 3, 2025 · Artificial Intelligence

What Nvidia’s New Blackwell & Rubin GPUs Reveal About the Future of AI Compute

Nvidia’s latest GTC briefing details the Blackwell and Rubin GPU roadmaps, highlighting massive GPU shipments, new NVLink 6.0 interconnects, 448 Gbps SerDes, and architectural innovations aimed at boosting AI compute performance, efficiency, and scalability across data‑center workloads.

AI computeBlackwellGPU architecture

0 likes · 6 min read

What Nvidia’s New Blackwell & Rubin GPUs Reveal About the Future of AI Compute

Architects' Tech Alliance

Oct 30, 2025 · Artificial Intelligence

How to Pick the Perfect Nvidia GPU for AI Servers – From Tesla to Hopper

This article traces the evolution of Nvidia’s GPU architectures—from the early Tesla series through Fermi, Kepler, Maxwell, Pascal, Volta, Turing, Ampere, and the latest Hopper—detailing their specifications, key features, and offering a systematic decision‑making guide for AI server designers to select the optimal GPU based on workload, model size, precision, scalability, and total cost of ownership.

AI serverGPU architectureGPU selection

0 likes · 16 min read

How to Pick the Perfect Nvidia GPU for AI Servers – From Tesla to Hopper

Tencent Cloud Developer

Sep 26, 2025 · Fundamentals

Why GPUs Really Matter: From Architecture Basics to CUDA Programming

This article explains why GPUs have become the preferred platform for high‑performance computing, covering Dennard scaling, GPU speed advantages, theoretical FLOPS calculations, CUDA programming examples like SAXPY, the SIMT execution model, instruction pipelines, and modern techniques for handling branch divergence and register bank conflicts.

CUDA programmingGPU architectureGPU performance

0 likes · 38 min read

Why GPUs Really Matter: From Architecture Basics to CUDA Programming

Architects' Tech Alliance

Sep 19, 2025 · Artificial Intelligence

Why Nvidia’s Rubin CPX GPU Could Revolutionize Long-Context AI Inference

Nvidia's Rubin CPX GPU, unveiled in September 2025, uses GDDR7 memory and a split‑stage architecture to dramatically boost token‑per‑second rates for long‑context inference, while its integration into third‑generation Oberon servers promises higher power density, improved ROI, and scalable data‑center deployments.

AI inferenceData CenterGPU architecture

0 likes · 9 min read

Why Nvidia’s Rubin CPX GPU Could Revolutionize Long-Context AI Inference

Architects' Tech Alliance

Sep 14, 2025 · Artificial Intelligence

Why Nvidia’s Blackwell GPUs Are Redefining AI Performance

The article analyzes Nvidia's 2023 Blackwell GPU series and GB200 NVL72 architecture, detailing their advanced 3‑4nm manufacturing, redesigned CUDA cores, next‑gen ray‑tracing and DLSS upgrades, massive compute and memory bandwidth gains, NVLink Gen5 improvements, and the diverse GB200 product configurations for high‑performance AI workloads.

AI accelerationBlackwell GPUGPU architecture

0 likes · 7 min read

Why Nvidia’s Blackwell GPUs Are Redefining AI Performance

Refining Core Development Skills

Aug 26, 2025 · Fundamentals

How NVIDIA’s Fermi Architecture Revolutionized GPU Computing: Key Improvements Explained

Fermi, NVIDIA’s 2010 GPU architecture, introduced major upgrades over the Tesla line—including a 40 nm process, vastly increased transistor count, GDDR5 memory, L2 cache, enhanced FP64 performance, ECC support, and unified CPU‑GPU addressing—making it the first truly complete GPU computing platform.

CUDA optimizationECC MemoryFP64 performance

0 likes · 12 min read

How NVIDIA’s Fermi Architecture Revolutionized GPU Computing: Key Improvements Explained

Architects' Tech Alliance

Aug 18, 2025 · Artificial Intelligence

How Large Model Training Dominates Compute and What New Techniques Can Change It

This article explains why pre‑training large AI models consumes 90‑99% of total compute, describes the full training and inference pipelines, introduces resource‑saving strategies such as PD‑separation, and reviews market trends and infrastructure challenges shaping the next generation of AI systems.

AI infrastructureAI trainingGPU architecture

0 likes · 13 min read

How Large Model Training Dominates Compute and What New Techniques Can Change It

Architects' Tech Alliance

Aug 10, 2025 · Artificial Intelligence

From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning

This article traces the evolution of NVIDIA's GPU architectures—from Volta's pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell—highlighting key innovations such as mixed‑precision support, NVLink, and specialized Tensor Core designs that have dramatically boosted AI training and inference performance.

AI hardwareGPU architectureNVLink

0 likes · 10 min read

From Volta to Blackwell: How NVIDIA GPUs Evolved for Deep Learning

Refining Core Development Skills

Aug 7, 2025 · Fundamentals

Why NVIDIA’s First Data‑Center GPU Revolutionized Computing: Inside the Tesla G80 Architecture

This article explains how NVIDIA transitioned from gaming graphics cards to general‑purpose GPUs with the first data‑center Tesla GPU, detailing the unified shader architecture, the internal components of TPCs and SMs, CUDA 1.0 programming basics, and performance calculations that illustrate the massive computational advantage over contemporary CPUs.

CUDAGPGPUGPU architecture

0 likes · 23 min read

Why NVIDIA’s First Data‑Center GPU Revolutionized Computing: Inside the Tesla G80 Architecture

Open Source Linux

Jul 5, 2025 · Fundamentals

Why Nvidia’s Blackwell GPU Beats AMD RDNA4: Deep Dive into GB202 Architecture

This article examines Nvidia's massive GB202 Blackwell GPU—its 750 mm² die, 922 billion transistors, 192 SMs, and extensive memory subsystem—while comparing its compute units, instruction caches, atomics, and bandwidth against AMD's RDNA4‑based RX 9070, highlighting architectural trade‑offs, performance metrics, and future GPU competition.

AMD RDNA4GB202GPU architecture

0 likes · 20 min read

Why Nvidia’s Blackwell GPU Beats AMD RDNA4: Deep Dive into GB202 Architecture

Architects' Tech Alliance

Jul 1, 2025 · Fundamentals

Why Nvidia’s Blackwell GPU Outshines AMD’s RDNA4 – A Deep Architectural Dive

This article provides a detailed technical comparison between Nvidia's Blackwell GB202 GPU and AMD's RDNA4 RX 9070, covering CPU and GPU updates, SM front‑end design, memory hierarchies, execution units, atomics, L2 performance, power consumption, and real‑world benchmark results such as FluidX3D.

AMD RDNA4GPU architectureMemory Hierarchy

0 likes · 21 min read

Why Nvidia’s Blackwell GPU Outshines AMD’s RDNA4 – A Deep Architectural Dive

Architects' Tech Alliance

Jun 9, 2025 · Artificial Intelligence

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

In March 2024 Nvidia unveiled the Blackwell GPU family and the GB200 NVL72 architecture, featuring 3‑4 nm processes, redesigned CUDA cores, next‑gen ray‑tracing, upgraded DLSS, massive FP16/FP8 compute gains, 8 TB/s memory bandwidth, and NVLink Gen5, while also presenting complex power, cooling, and packaging challenges for large‑scale AI deployments.

AI accelerationBlackwellGPU

0 likes · 6 min read

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

Architects' Tech Alliance

May 6, 2025 · Artificial Intelligence

Evolution of NVIDIA GPU Architectures for AI from Volta to Blackwell

The article reviews NVIDIA's GPU architecture progression—from Volta's pioneering Tensor Cores through Turing, Ampere, Hopper, and the latest Blackwell and Rubin designs—highlighting key innovations, performance gains for deep learning, and related resource updates for AI practitioners.

Artificial IntelligenceGPU architectureHigh‑Performance Computing

0 likes · 9 min read

Evolution of NVIDIA GPU Architectures for AI from Volta to Blackwell

Infra Learning Club

Mar 18, 2025 · Fundamentals

Can You Direct a CUDA Kernel to a Specific SM?

The article explains CUDA’s architecture and SM basics, describes how the warp scheduler and dispatch units assign thread blocks to SMs, and concludes that external control cannot target a specific SM, while mentioning the NanoFlow intra‑device parallelism approach as a possible indirect optimization.

CUDAGPU architectureKernel Scheduling

0 likes · 7 min read

Can You Direct a CUDA Kernel to a Specific SM?

Architects' Tech Alliance

Dec 14, 2024 · Industry Insights

Inside the High‑Performance GPU Server: A Deep Dive into A100/A800 & H100 Topologies

This article provides a detailed technical analysis of multi‑GPU server architectures, covering component breakdowns, NVSwitch networking, bandwidth calculations, and the differences between NVIDIA A100, A800, and H100 configurations for large‑scale AI workloads.

AI hardwareGPU architectureHigh Performance Computing

0 likes · 12 min read

Inside the High‑Performance GPU Server: A Deep Dive into A100/A800 & H100 Topologies

Architects' Tech Alliance

Apr 8, 2024 · Fundamentals

Unlocking GPU Server Architecture: PCIe, NVLink, NVSwitch & HBM Explained

This article provides a comprehensive breakdown of high‑performance GPU server infrastructure, covering PCIe generations, NVLink evolution, NVSwitch and NVLink switches, HBM memory technologies, and bandwidth measurement units, helping readers understand the hardware connections and performance considerations essential for large‑scale model training.

GPU architectureHBMHigh Performance Computing

0 likes · 10 min read

Unlocking GPU Server Architecture: PCIe, NVLink, NVSwitch & HBM Explained

Architects' Tech Alliance

Apr 2, 2024 · Artificial Intelligence

Evolution and Forecast of Nvidia NVLink, NVLink C2C, and B100/X100 GPU Architectures

The article analyses the historical evolution of Nvidia's NVLink and NVLink C2C interconnect technologies, compares them with PCIe, Ethernet and InfiniBand, and uses these trends to predict future AI‑chip architectures such as the B100 and X100 GPUs, highlighting design trade‑offs and packaging challenges.

AI ChipB100GPU architecture

0 likes · 15 min read

Evolution and Forecast of Nvidia NVLink, NVLink C2C, and B100/X100 GPU Architectures

Architects' Tech Alliance

Feb 22, 2023 · Industry Insights

RDNA 2 vs Nvidia Ampere: Architecture, Cache, and Game Performance

This article provides an in‑depth technical analysis of AMD’s RDNA 2 GPU architecture, comparing its compute units, cache hierarchy, latency and bandwidth characteristics with Nvidia’s Ampere, and evaluates real‑world game performance in titles such as Cyberpunk 2077, Titanic Honor & Glory, and Gunner HEAT PC.

AMDGPU architectureRDNA 2

0 likes · 30 min read

RDNA 2 vs Nvidia Ampere: Architecture, Cache, and Game Performance

Architects' Tech Alliance

Jul 4, 2022 · Industry Insights

Inside NVIDIA Hopper H100: Architecture, Performance, and AI Breakthroughs

The article provides a detailed technical analysis of NVIDIA's Hopper‑based H100 GPU, covering its 4 nm process, 800 billion transistors, GPC/TPC hierarchy, new FP8 Tensor Cores, Transformer engine, Tensor Memory Accelerator, and the resulting six‑fold performance jump over the previous A100 generation.

AI accelerationFP8GPU architecture

0 likes · 8 min read

Inside NVIDIA Hopper H100: Architecture, Performance, and AI Breakthroughs

Architects' Tech Alliance

Mar 20, 2021 · Fundamentals

Evolution of NVIDIA GPU Architectures from Fermi to Ampere

This article outlines the progression of NVIDIA GPU architectures—from the early Fermi and Kepler designs through Maxwell, Pascal, Volta, Turing, and the latest Ampere—detailing compute capabilities, SM structures, FP64/FP32 ratios, Tensor Core introductions, and their impact on AI and high‑performance computing.

AICUDAGPU architecture

0 likes · 19 min read

Evolution of NVIDIA GPU Architectures from Fermi to Ampere

TAL Education Technology

May 14, 2020 · Artificial Intelligence

An Introduction to GPU Computing and CUDA Architecture

This article provides a concise overview of GPU computing fundamentals, covering GPU hardware components, memory hierarchy, parallel execution models, and the CUDA programming framework, illustrating how CPUs and GPUs cooperate in heterogeneous computing environments.

CUDACUDA programmingGPU

0 likes · 16 min read

An Introduction to GPU Computing and CUDA Architecture

Architects' Tech Alliance

Apr 18, 2019 · Fundamentals

What Powers Modern Graphics? A Deep Dive into GPU History and Architecture

This article traces the evolution of GPUs from early graphics chips to modern parallel processors, explains their internal pipeline, compares CPU and GPU architectures, and introduces key acceleration frameworks like CUDA and OpenCL for general‑purpose computing.

CUDAGPUGPU architecture

0 likes · 13 min read

What Powers Modern Graphics? A Deep Dive into GPU History and Architecture

Architects' Tech Alliance

Jul 2, 2017 · Fundamentals

Differences Between NVIDIA Tesla and GeForce GPUs: Architecture, Performance, and Use Cases

This article compares NVIDIA's Tesla and GeForce GPU families, detailing their target markets, design differences, core architectures, double‑precision performance, ECC support, memory bandwidth, interface options, software and OS compatibility, power efficiency, and management features to help readers choose the right GPU for HPC or gaming workloads.

GPUGPU architectureGeForce

0 likes · 11 min read

Differences Between NVIDIA Tesla and GeForce GPUs: Architecture, Performance, and Use Cases