Tagged articles
2 articles
Page 1 of 1
Infra Learning Club
Infra Learning Club
Mar 18, 2025 · Fundamentals

Can You Direct a CUDA Kernel to a Specific SM?

The article explains CUDA’s architecture and SM basics, describes how the warp scheduler and dispatch units assign thread blocks to SMs, and concludes that external control cannot target a specific SM, while mentioning the NanoFlow intra‑device parallelism approach as a possible indirect optimization.

CUDAGPU architectureKernel Scheduling
0 likes · 7 min read
Can You Direct a CUDA Kernel to a Specific SM?
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 17, 2024 · Artificial Intelligence

Boosting LLM Inference: How NanoFlow Doubles Throughput

The article introduces NanoFlow, a novel service framework that leverages intra‑device parallelism, operation‑based pipelining, and async scheduling to significantly improve large language model serving throughput, achieving up to 1.91× higher performance while integrating with Alibaba Cloud PAI.

Alibaba Cloud PAIGPU schedulingLLM serving
0 likes · 7 min read
Boosting LLM Inference: How NanoFlow Doubles Throughput