Cloud Native 22 min read

DeWu's Cloud-Native Container Management Practices

Since August 2021, DeWu App has built a cloud‑native, multi‑cluster Kubernetes platform that uses an OAM‑style CloneSet model, Helm‑generated resources, Karmada‑based federation, custom scheduler plugins for reservation and node‑balance, offline mixing for Flink, a unified KubeAutoScaler, and a self‑built KubeAI stack, achieving significant cost cuts and improved stability while planning further middleware containerization and multi‑cloud expansion.

DeWu Technology

Dec 27, 2023

DeWu's Cloud-Native Container Management Practices

DeWu App has rapidly expanded its infrastructure, prompting a focus on efficiency and cost control. Since August 2021, the company has been building a cloud‑native service system with high availability, observability, and operational efficiency.

Application management follows an OAM‑like model: a CloneSet workload (Kruise) represents an "application cluster", each Pod is an instance, and Ingress/Service configurations are abstracted as "application routing". Configuration and feature layers are rendered with Helm to generate Kubernetes resources.

To address single‑cluster failures, DeWu adopts a multi‑cluster federation strategy, extending Karmada with custom CRDs for workload propagation and overrides, moving batch‑release logic to the host clusters.

Scheduling optimization includes application profiling, resource reservation, balanced scheduling, and both real‑time and offline mixing. Profiling collects Prometheus metrics, processed by a custom KubeRM service to compute Pod Request = utilization / safety‑water‑mark, with policies for different service grades.

A custom scheduler plugin implements resource reservation, improving dispatch decisions. Additional plugins—CoolDownHotNode, HybridUnschedulable, NodeBalance, and NodeInfoRt—enhance node temperature balance, elastic‑resource handling, CPU‑request balancing, and real‑time scoring.

Offline mixing targets Flink tasks, using CPU‑binding strategies (LSX, LSR, LS, BE) and a custom BE‑CPU/BE‑Memory resource type. The Kube‑Agent DaemonSet reports BE resources via the Device‑Plugin mechanism and performs CPU pinning.

Elastic scaling is unified by the KubeAutoScaler component, managing HPA, VPA, scheduled scaling, and GPU sidecar‑based scaling for AI services.

Model inference migrated from V100 to A10, reducing cost ~20% and improving stability.

CPU‑intensive services switched from Intel to AMD, cutting cost ~14%.

Resource pool management, redundancy control, cluster merging, and fragmentation cleanup improve overall utilization.

The AI scenario is supported by the self‑built KubeAI platform, covering model development, training, inference, versioning, and AIGC/GPT services.

Future work includes further containerization of middleware, strengthening mixing and scaling solutions, enhancing Kubernetes stability, and advancing multi‑cloud strategies for flexible, resilient infrastructure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native AI Kubernetes Autoscaling Cost Management Multi-Cluster resource-optimization

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.