Tag

Volcano

1 views collected around this technical thread.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
May 15, 2025 · Cloud Native

How 360’s AI Platform Boosted GPU Utilization with Volcano Scheduler

360’s AI platform migrated its GPU clusters to a cloud‑native architecture and adopted the Volcano scheduler, achieving over 45% GPU utilization, less than 7% fragmentation, and more than 1000000 scheduled Pods, while leveraging flexible plugins, hierarchical queues, and resource pooling to optimize AI and big‑data workloads.

AI PlatformCloud NativeGPU scheduling
0 likes · 13 min read
How 360’s AI Platform Boosted GPU Utilization with Volcano Scheduler
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 19, 2024 · Operations

How TAI Platform Optimizes Large‑Model Scheduling and Fault Recovery on Kubernetes

This article explains how the TAI platform leverages Kubernetes and Volcano to tackle fault, efficiency, and usability challenges in large‑model training and inference, detailing custom resources, automated fault detection, and advanced scheduling strategies that boost resource utilization and performance.

AI infrastructureLarge ModelsVolcano
0 likes · 9 min read
How TAI Platform Optimizes Large‑Model Scheduling and Fault Recovery on Kubernetes
Efficient Ops
Efficient Ops
May 22, 2022 · Cloud Native

How to Run Multiple Containers Sequentially in a Single Kubernetes Pod

This article explains how to execute several containers one after another within a single Kubernetes pod by leveraging initContainers and native Job mechanisms, compares alternative solutions such as Volcano and Argo, provides complete YAML examples, and discusses practical considerations like volume sharing, security contexts, and timeout settings.

ArgoInitContainersJob
0 likes · 9 min read
How to Run Multiple Containers Sequentially in a Single Kubernetes Pod