Tag

AI Service

0 views collected around this technical thread.

Alimama Tech
Alimama Tech
Feb 12, 2025 · Artificial Intelligence

HighService: A High‑Performance Pythonic AI Service Framework for Model Inference and Global Resource Scheduling

HighService, Alibaba’s Pythonic AI service framework, accelerates large‑model inference and maximizes GPU utilization by separating CPU‑GPU processes, offering out‑of‑the‑box quantization, parallelism and caching, and dynamically reallocating idle GPUs across clusters through a master‑worker scheduler to keep online latency low while boosting offline throughput for diffusion and LLM workloads.

AI ServiceHigh PerformanceModel Inference
0 likes · 16 min read
HighService: A High‑Performance Pythonic AI Service Framework for Model Inference and Global Resource Scheduling