Tagged articles
4 articles
Page 1 of 1
AI Explorer
AI Explorer
Mar 16, 2026 · Artificial Intelligence

HyperOffload: A New Storage Paradigm Aiming to Break the AI Memory Wall

HyperOffload, a joint effort by Shanghai Jiao Tong University and Huawei’s MindSpore team, proposes a dynamic tensor offloading system that moves data between GPU memory, CPU RAM, and SSDs, aiming to overcome the “memory wall” that limits trillion‑parameter AI model training and deployment.

AI infrastructureAI memory wallGPU Memory Management
0 likes · 6 min read
HyperOffload: A New Storage Paradigm Aiming to Break the AI Memory Wall
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Dec 22, 2025 · Artificial Intelligence

The Core Ideas Behind Paged Attention for KV‑Caching

This article explains how Paged Attention, introduced by the vLLM team, applies virtual‑memory techniques, non‑contiguous block mapping, copy‑on‑write reuse, distributed scheduling, and hardware‑level optimizations to improve KV‑cache efficiency and reduce memory fragmentation in large language model serving.

Copy-on-WriteDistributed SchedulingGPU Memory Management
0 likes · 6 min read
The Core Ideas Behind Paged Attention for KV‑Caching
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 3, 2025 · Artificial Intelligence

How to Train a 671B‑Scale Model with RL: Insights from a verl Internship

This article shares a detailed, first‑hand analysis of the technical challenges, framework choices, memory management, weight conversion, precision alignment, and efficiency optimizations encountered while building reinforcement‑learning pipelines for a 671‑billion‑parameter model using the verl ecosystem.

GPU Memory ManagementLarge ModelsMegatron
0 likes · 16 min read
How to Train a 671B‑Scale Model with RL: Insights from a verl Internship
Infra Learning Club
Infra Learning Club
Nov 1, 2024 · Artificial Intelligence

Configuring vLLM swap_space and cpu_offload_gb for Stable Large-Model Inference

The article explains vLLM’s GPU compute capability requirement, describes the swap_space and cpu_offload_gb parameters, outlines their ideal usage scenarios, and provides step‑by‑step code examples that demonstrate how adjusting these settings enables loading and running a 7B‑parameter model on a 16 GB T4 GPU.

GPU Memory Managementcpu_offload_gblarge language model inference
0 likes · 9 min read
Configuring vLLM swap_space and cpu_offload_gb for Stable Large-Model Inference