Operations 10 min read

Capacity Management: Goals, Stages, Optimization Techniques, and Scaling Practices

The article explains how capacity management balances cost control and service quality through defined goals, three development stages, detailed resource optimization methods, stress‑testing metrics and standards, and automated scaling to achieve significant cost reductions while maintaining system stability.

Architect
Architect
Architect
Capacity Management: Goals, Stages, Optimization Techniques, and Scaling Practices

Background: As ZhaiZhai's business expands, hardware and infrastructure investments increase but resource utilization declines, prompting the need for capacity management to balance cost and service quality.

1. Goals of Capacity Management

Capacity management aims at cost control and business support, ensuring services meet SLA while optimizing resource usage.

2. Development Stages

Three stages: (1) No capacity management, mixed deployment on physical and KVM machines; (2) Analyzing availability and performance to reduce mixing, decommission KVM, improve utilization, cutting resource cost by ~50%; (3) Cloud era with stress‑test standards, further halving cost.

3. Capacity Management Practices

3.1 Capacity Water Level

Defines the ratio of actual consumed resources to total available resources, measured for cloud hosts (CPU, memory, disk, NIC) and application services (JVM memory, threads, GC frequency, QPS, response time).

3.2 Resource Capacity Optimization

Examples include reducing service CPU from 4 cores to 2 when average usage is low, adjusting JVM memory using the formula JVM total memory = heap + thread stack (XSS) * thread count + constant overhead , and mixed deployment of high‑ and low‑priority services.

3.3 Cluster Capacity

Combines stress‑testing with capacity water level to determine accurate cluster capacity, using either log replay/TCP‑Copy per‑instance tests or whole‑cluster tests.

3.4 Stress‑Test Metrics

System metrics: CPU, memory, disk I/O, NIC bandwidth. Service metrics: response time, latency percentiles, error rate, slow‑request ratio.

3.5 Stress‑Test Standards

Defines acceptable error rates (≤1% for A‑level services, ≤3% for B, ≤5% for others) and response‑time thresholds for median, 90th, and 99th percentiles relative to average.

4. Scaling Operations

Based on capacity data, automatic scaling is applied during promotional activities and for daily service quality assurance.

5. Summary

Capacity management is a complex engineering discipline that combines strategies, processes, and standards to achieve cost reduction and efficiency while ensuring service stability.

operationsperformance testingresource optimizationScalingcapacity-management
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.