Weibo Multimodal Content Understanding Service Architecture and GPU Heterogeneous Cluster Solutions
This article details Weibo's large‑scale multimodal content understanding platform, covering its background, data and model heterogeneity challenges, the end‑to‑end workflow, GPU‑heterogeneous cluster design, resource scheduling, performance optimization for distributed training and online inference, and comprehensive monitoring to ensure stable, low‑latency AI services.
Weibo supports over 40 business clusters and 120 deep‑learning models with peak QPS of 20,000, handling multimodal data (text, images, video, audio) for billions of daily posts.
The platform faces four main challenges: massive data volume, heterogeneous algorithms and frameworks (TensorFlow, PyTorch, Caffe, Paddle), the need for standardized model deployment via a C++ RPC framework and Python DAG pipelines, and platformization to meet real‑time latency and stability requirements.
The end‑to‑end workflow includes sample generation, model training, evaluation, registration, and online serving, with distributed training achieving up to 9× efficiency over open‑source baselines.
GPU heterogeneous clusters (over 1,000 GPUs of various generations) are managed via Kubernetes, cGPU, and GPUManager, enabling fine‑grained allocation of memory and compute resources, automated scaling, gray‑release deployments, and full‑stack monitoring of GPU, CPU, network, and service metrics.
Performance optimizations cover distributed training acceleration, inference speed‑up (20‑40% gains), and careful GPU utilization balancing beyond simple memory partitioning, emphasizing compute‑bound workloads and batch size tuning.
Comprehensive monitoring tracks GPU usage, memory bandwidth, temperature, power, TCP connections, service latency, QPS, and error rates to maintain three‑nine (99.9%) stability, with mechanisms for automatic scaling during traffic spikes.
Overall, the system integrates multimodal AI models into Weibo's ecosystem, generating feature vectors for downstream recommendation, ranking, moderation, and de‑duplication, thereby closing the loop from user‑generated content to personalized consumption.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.