Alibaba Cloud Infrastructure
Author

Alibaba Cloud Infrastructure

For uninterrupted computing services

357
Articles
0
Likes
1.1k
Views
0
Comments
Recent Articles

Latest from Alibaba Cloud Infrastructure

100 recent articles max
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 27, 2025 · Cloud Native

How to Safely Deploy AI Inference Models Across Multi‑Cluster Environments with ACK One Fleet

This article explains why AI inference services require multi‑cluster gray‑release, outlines the risks of traditional updates, and details how ACK One Fleet combined with Kruise Rollout provides a controlled, observable, and rollback‑capable solution for deploying large AI models across hybrid cloud clusters.

ACK OneAIGray Release
0 likes · 10 min read
How to Safely Deploy AI Inference Models Across Multi‑Cluster Environments with ACK One Fleet
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 22, 2025 · Artificial Intelligence

Boost LLM Inference with KV‑Cache‑Aware Routing on Alibaba Cloud ACK GIE

This article explains why KV‑Cache hit rate is critical for large‑model inference, describes vLLM's automatic prefix caching, outlines the distributed cache challenges, and provides a step‑by‑step guide to deploying Alibaba Cloud ACK Gateway with Inference Extension's precise‑mode prefix‑cache‑aware routing, backed by benchmark results.

Alibaba CloudKV CacheKubernetes
0 likes · 18 min read
Boost LLM Inference with KV‑Cache‑Aware Routing on Alibaba Cloud ACK GIE
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 17, 2025 · Cloud Native

AI Training Revives Gang Scheduling in Kubernetes for Elastic Resource Orchestration

The article examines how the rise of large‑model AI training reintroduces the need for gang scheduling in Kubernetes, contrasting the rigid resource requirements of HPC‑style workloads with cloud‑native elasticity, and outlines the historical evolution, current implementations, and future directions for achieving more flexible, high‑throughput compute orchestration.

AI trainingCloud NativeGang Scheduling
0 likes · 22 min read
AI Training Revives Gang Scheduling in Kubernetes for Elastic Resource Orchestration
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 9, 2025 · Cloud Native

How to Detect and Resolve Kernel Memory & CPU Latency in Kubernetes Clusters

In cloud‑native Kubernetes environments, resource over‑commit and mixed deployments can cause kernel‑level memory reclaim and CPU scheduling delays that manifest as application jitter, and this article explains how to visualize, diagnose, and remediate those delays using the SysOM exporter and related metrics.

CPU schedulingKubernetesMemory reclaim
0 likes · 13 min read
How to Detect and Resolve Kernel Memory & CPU Latency in Kubernetes Clusters
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 25, 2025 · Operations

How to Uncover Hidden Java Memory Leaks in Kubernetes Pods

This article explains why Java applications in cloud containers often encounter OOMKilled pods, details the hidden memory consumption from JNI, libc, and Transparent Huge Pages, and demonstrates step‑by‑step how to use Alibaba Cloud OS Console's memory panorama analysis to identify and mitigate the root causes.

JNIKubernetesMemory Leak
0 likes · 11 min read
How to Uncover Hidden Java Memory Leaks in Kubernetes Pods