DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs

DeepSeek’s OpenSourceWeek introduced FlashMLA, a GPU‑optimized MLA decoding kernel for Hopper GPUs that leverages FlashAttention and CUTLASS to dramatically improve large‑model inference performance, with early adoption showing up to 30% higher compute utilization and doubled speed in some scenarios.

Artificial IntelligenceDeepSeekFlashMLA

0 likes · 3 min read

DeepSeek Announces FlashMLA: An Efficient Multi‑Layer Attention Decoding Kernel for Hopper GPUs

Architects' Tech Alliance

Jul 23, 2024 · Industry Insights

Inside Los Alamos’ Venado Supercomputer: Architecture, Performance, and HPC Trends

The Venado supercomputer, unveiled at Los Alamos, combines Nvidia Grace CPUs, Hopper GPUs, HPE Slingshot interconnects, and massive memory bandwidth to achieve a 15.6‑petaflop FP64 peak, illustrating the evolving balance between CPU and GPU workloads in modern high‑performance computing.

CPUGPUGrace

0 likes · 14 min read

Inside Los Alamos’ Venado Supercomputer: Architecture, Performance, and HPC Trends

Architects' Tech Alliance

Jul 4, 2022 · Industry Insights

Inside NVIDIA Hopper H100: Architecture, Performance, and AI Breakthroughs

The article provides a detailed technical analysis of NVIDIA's Hopper‑based H100 GPU, covering its 4 nm process, 800 billion transistors, GPC/TPC hierarchy, new FP8 Tensor Cores, Transformer engine, Tensor Memory Accelerator, and the resulting six‑fold performance jump over the previous A100 generation.

AI accelerationFP8GPU architecture

0 likes · 8 min read

Inside NVIDIA Hopper H100: Architecture, Performance, and AI Breakthroughs

IT Services Circle

Mar 24, 2022 · Artificial Intelligence

NVIDIA Unveils H100 GPU with Hopper Architecture: Massive Performance Gains for AI

At the recent GTC event, NVIDIA introduced the H100 GPU built on the Hopper architecture using TSMC 4nm process, featuring 800 billion transistors, 16,896 CUDA cores, up to 700 W power, 3 TB/s memory bandwidth, and a specialized Transformer engine that accelerates large‑model training up to six times faster, alongside the Grace CPU Superchip and new AI supercomputing systems.

AIGPUGrace CPU

0 likes · 11 min read

NVIDIA Unveils H100 GPU with Hopper Architecture: Massive Performance Gains for AI