Cloud Computing 16 min read

Soul's Container Cluster Cost Governance: A Case Study on Resource Optimization

Soul's container cluster cost governance case study details their approach to optimizing resource utilization through Kubernetes-based solutions, addressing challenges like resource fragmentation and implementing strategies such as SNAS for elastic scaling and HPA+CronHPA coordination to achieve significant cost reductions.

Soul Technical Team
Soul Technical Team
Soul Technical Team
Soul's Container Cluster Cost Governance: A Case Study on Resource Optimization

Soul’s container cluster cost governance case study details their approach to optimizing resource utilization through Kubernetes-based solutions, addressing challenges like resource fragmentation and implementing strategies such as SNAS for elastic scaling and HPA+CronHPA coordination to achieve significant cost reductions.

The governance process involved addressing multiple obstacles: HPA node expansion limitations during traffic surges, service resource preemption affecting stability, resource pool wastage during tidal fluctuations, and the complexity of ongoing operations. Solutions included service governance improvements (HPA+CronHPA coordination), resource pool elasticity upgrades (SNAS implementation), and establishing a resource usage observation mechanism.

Key technical implementations comprised:

Service Governance: Optimized HPA+CronHPA coordination to handle traffic surges and ensure resource availability during peak periods.

Resource Pool Elasticity: Deployed SNAS (Soul Node AutoScaler) to dynamically adjust node counts based on resource pool water levels, reducing waste while maintaining service continuity.

Service Binding: Separated CPU and GPU services, optimized resource pool assignments, and implemented resource pool water level control.

Hotspot Rescheduling: Utilized Koord-descheduler for low-node-load-based pod migration during resource contention.

Cost Control: Established resource approval workflows, implemented cost monitoring dashboards, and created service load inspection mechanisms.

Governance outcomes demonstrated improved resource utilization (90%+), reduced overall costs (20%+), and enhanced operational stability through systematic monitoring and optimization.

cloud computingKubernetesresource managementcost optimizationcontainer clusters
Soul Technical Team
Written by

Soul Technical Team

Technical practice sharing from Soul

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.