Tag

KServe

1 views collected around this technical thread.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 10, 2025 · Artificial Intelligence

Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe

This article presents a hybrid‑cloud solution that uses ACK Edge and KServe to dynamically allocate on‑premise and cloud GPU resources for large‑language‑model inference, addressing tidal traffic patterns, reducing costs, and ensuring high availability through elastic scaling and custom scheduling policies.

ACK EdgeElastic InferenceKServe
0 likes · 13 min read
Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 8, 2025 · Artificial Intelligence

Deploying a Production‑Ready DeepSeek‑R1 Inference Service on Alibaba Cloud ACK with KServe

This guide explains how to deploy a production‑ready DeepSeek‑R1 inference service on Alibaba Cloud ACK using KServe, covering model preparation, storage configuration, service deployment, observability, autoscaling, model acceleration, gray‑release and GPU‑shared inference.

Alibaba CloudDeepSeekGPU
0 likes · 13 min read
Deploying a Production‑Ready DeepSeek‑R1 Inference Service on Alibaba Cloud ACK with KServe
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Sep 5, 2024 · Artificial Intelligence

Deploying NVIDIA NIM on Alibaba Cloud ACK with Cloud‑Native AI Suite: A Step‑by‑Step Guide

This guide explains how to quickly build a high‑performance, observable, and elastically scalable LLM inference service by deploying NVIDIA NIM on an Alibaba Cloud ACK cluster using the Cloud‑Native AI Suite, KServe, Prometheus, Grafana, and custom autoscaling based on request‑queue metrics.

Alibaba Cloud ACKGrafanaKServe
0 likes · 15 min read
Deploying NVIDIA NIM on Alibaba Cloud ACK with Cloud‑Native AI Suite: A Step‑by‑Step Guide