Tagged articles
3 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 29, 2026 · Artificial Intelligence

RTPurbo: >97% Sparsity and 9× Faster Long-Context LLM Inference with Minimal Training

The article presents RTPurbo, a lightweight two‑stage training method that converts full‑attention LLMs into highly sparse models with over 97% sparsity, achieving up to 9.36× prefill and 2.01× decode speedups while preserving near‑lossless accuracy across long‑context benchmarks up to 512K tokens.

Dynamic Token SelectionKernel OptimizationLLM inference
0 likes · 17 min read
RTPurbo: >97% Sparsity and 9× Faster Long-Context LLM Inference with Minimal Training
AI Frontier Lectures
AI Frontier Lectures
Jul 13, 2025 · Artificial Intelligence

How HarmoniCa Boosts Diffusion Model Speed with Joint Training‑Inference Caching

HarmoniCa, a new feature‑caching framework co‑designed by HKUST, Beihang University, and SenseTime, tackles diffusion model inference bottlenecks by aligning training and inference through Step‑Wise Denoising Training and an Image Error Proxy Objective, achieving up to 2× speedup while preserving image quality.

Performance Accelerationdiffusion modelsfeature caching
0 likes · 9 min read
How HarmoniCa Boosts Diffusion Model Speed with Joint Training‑Inference Caching
Code DAO
Code DAO
Dec 11, 2021 · Artificial Intelligence

Nimble: A Lightweight Parallel GPU Scheduler Boosting Deep Learning Performance

The article analyzes how Nimble reduces GPU scheduling overhead and enables parallel execution through ahead‑of‑time scheduling and automatic multi‑stream assignment, achieving up to 22.3× inference speedup over PyTorch and significantly improving GPU utilization for deep learning workloads.

GPU schedulingParallel ExecutionPerformance Acceleration
0 likes · 9 min read
Nimble: A Lightweight Parallel GPU Scheduler Boosting Deep Learning Performance