Cloud Computing 10 min read

Multi-Region Serverless Compute Scheduling with Alibaba Cloud ACK One Registered Cluster

This guide explains how Alibaba Cloud's ACK One registered cluster provides multi‑region serverless GPU compute scheduling, addressing AI workload elasticity by using region‑specific labels, ResourcePolicy, and the ack‑co‑scheduler to automatically balance resources across regions.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Multi-Region Serverless Compute Scheduling with Alibaba Cloud ACK One Registered Cluster

As enterprises deepen digital transformation, flexibility and scalability of infrastructure become critical; traditional IDC data centers lack elasticity, prompting the use of Alibaba Cloud ACK One registered clusters, which offer minute‑level access, full Kubernetes compatibility, and serverless elasticity.

In the AI era, massive model parameters increase compute demand, exposing limitations of single‑region GPU resources such as regional GPU type differences and inventory fluctuations, which hinder high‑concurrency inference workloads.

Alibaba Cloud introduces a multi‑region serverless compute scheduling solution for ACK One, aiming to provide unlimited compute supply across regions, enabling large‑scale, low‑latency AI inference deployments.

Users can create an ACK One registered cluster, enable the virtual node component, and label workloads with alibabacloud.com/serverless-region-id to target a specific region. Example deployment YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx-gpu-specified-region
  name: nginx-gpu-deployment-specified-region
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-gpu-specified-region
  template:
    metadata:
      labels:
        alibabacloud.com/acs: "true"
        alibabacloud.com/compute-class: gpu
        alibabacloud.com/compute-qos: default
        alibabacloud.com/gpu-model-series: example-model  # replace with actual model, e.g., T4
        alibabacloud.com/serverless-region-id:
# specify region
        app: nginx-gpu-specified-region
    spec:
      containers:
        - image: 'mirrors-ssl.aliyuncs.com/nginx:stable-alpine'
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
              protocol: TCP
          resources:
            limits:
              cpu: 1
              memory: 1Gi
              nvidia.com/gpu: "1"
            requests:
              cpu: 1
              memory: 1Gi
              nvidia.com/gpu: "1"

To achieve dynamic, multi‑region scheduling, the ack‑co‑scheduler’s ResourcePolicy can be used. The policy selects pods with a specific label and defines units that first try a preferred region and fall back to others when resources are insufficient. Example policy YAML:

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
  name: multi-vk-gpu-resourcepolicy
  namespace: default
spec:
  selector:
    app: nginx-gpu-resourcepolicy
  units:
    - resource: acs
      nodeSelector:
        topology.kubernetes.io/region:
type: virtual-kubelet
      podLabels:
        alibabacloud.com/serverless-region-id:
alibabacloud.com/compute-class: gpu
        alibabacloud.com/compute-qos: default
        alibabacloud.com/gpu-model-series: example-model
    - resource: acs
      nodeSelector:
        topology.kubernetes.io/region:
type: virtual-kubelet
      podLabels:
        alibabacloud.com/serverless-region-id:
alibabacloud.com/compute-class: gpu
        alibabacloud.com/compute-qos: default
        alibabacloud.com/gpu-model-series: example-model

The business workload can then be deployed with the custom scheduler:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx-gpu-resourcepolicy
  name: nginx-gpu-deployment-resourcepolicy
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-gpu-resourcepolicy
  template:
    metadata:
      labels:
        app: nginx-gpu-resourcepolicy
    spec:
      schedulerName: ack-co-scheduler
      containers:
        - image: 'mirrors-ssl.aliyuncs.com/nginx:stable-alpine'
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
              protocol: TCP
          resources:
            limits:
              cpu: 1
              memory: 1Gi
              nvidia.com/gpu: "1"
            requests:
              cpu: 1
              memory: 1Gi
              nvidia.com/gpu: "1"

These configurations enable automatic fallback to alternative regions when a region’s GPU capacity is exhausted, simplifying workload management while ensuring high availability for AI inference services.

For more details, refer to the linked documentation and submit a ticket to request ACS GPU resources.

serverlessKubernetesGPUAlibaba Cloudmulti-regionACK OneResourcePolicy
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.