Cloud Native 24 min read

Koordinator v1.6 Release: Advanced Heterogeneous Device Scheduling and GPU Management Features

The Koordinator v1.6 release introduces a suite of innovations—including GPU topology‑aware scheduling, end‑to‑end GPU & RDMA joint allocation, strong GPU isolation, differentiated GPU scoring, fine‑grained resource reservation, mixed‑workload QoS, and extensive scheduler and rescheduler optimizations—to efficiently manage heterogeneous resources in Kubernetes clusters for AI and high‑performance computing workloads.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Koordinator v1.6 Release: Advanced Heterogeneous Device Scheduling and GPU Management Features

Background: With the rapid rise of large‑model AI and high‑performance computing, demand for efficient heterogeneous device scheduling (GPU, NPU, RDMA) has surged. Koordinator v1.6 responds by enhancing device topology awareness, GPU‑RDMA joint allocation, and GPU isolation to improve AI training and inference performance while boosting cluster utilization.

Core Feature Highlights

1. GPU Topology‑Aware Scheduling – Supports detailed GPU topology detection across various models (e.g., NVIDIA L20/L40S, Huawei Sheng‑teng NPU) and provides APIs for NUMA‑aligned GPU placement. Example:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    scheduling.koordinator.sh/numa-topology-spec: '{"numaTopologyPolicy":"Restricted", "singleNUMANodeExclusive":"Preferred"}'
spec:
  containers:
  - resources:
      limits:
        koordinator.sh/gpu: 200
        cpu: 64
        memory: 500Gi
      requests:
        koordinator.sh/gpu: 200
        cpu: 64
        memory: 500Gi

2. End‑to‑End GDR Support – Enables GPUDirect RDMA for cross‑node GPU communication, reducing CPU/memory overhead. Joint GPU & RDMA allocation example:

apiVersion: v1
kind: Pod
metadata:
  name: pod-vf01
  namespace: kubeflow
  annotations:
    scheduling.koordinator.sh/device-joint-allocate: |-\n  {\n    "deviceTypes": ["gpu","rdma"]\n  }
    scheduling.koordinator.sh/device-allocate-hint: |-\n  {\n    "rdma": {\n      "vfSelector": {} //apply VF\n    }\n  }
spec:
  schedulerName: koord-scheduler
  containers:
  - name: container-vf
    resources:
      requests:
        koordinator.sh/gpu: 100
        koordinator.sh/rdma: 100
      limits:
        koordinator.sh/gpu: 100
        koordinator.sh/rdma: 100

3. Strong GPU Isolation (Sharing) – Allows multiple Pods to share a single GPU with precise core and memory ratios, leveraging HAMi‑Core for isolation. Example deployment of HAMi‑Core DaemonSet and a shared‑GPU Pod are provided.

4. Differential GPU Scheduling Strategies – Introduces NodeResourcesFitPlus and ScarceResourceAvoidance plugins to apply distinct scoring policies for GPU versus CPU/MEM resources, reducing GPU fragmentation and preventing CPU‑heavy workloads from occupying GPU nodes.

5. Fine‑Grained Resource Reservation – Enhances reservation APIs for exact‑match reservations, reservation‑ignored mode, and reservation affinity with taints/tolerations, enabling precise CPU‑GPU‑MEM alignment and pre‑emptive reservation handling.

6. Mixed‑Workload (Mid‑Tier) Enhancements – Improves resource over‑commit, node profiling, pod‑level QoS (Resctrl, CPU QoS), and metrics for better utilization of idle resources while preserving high‑priority task performance.

7. Scheduler & Rescheduler Optimizations – Moves PodGroup checks earlier, refines plugin state handling, adds latency metrics, and upgrades LowNodeLoad, MigrationController, and global eviction limits to boost scheduling throughput and stability in large clusters.

Future Plans: Continue strengthening GPU management, introduce NPU scheduling, develop rescheduling plugins for resource imbalance, and evolve end‑to‑end device management solutions.

cloud-nativekubernetesGPU schedulingKoordinatorHeterogeneous Resources
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.