Auto Scaling (AS) in Cloud Services: Architecture, Use Cases, and Optimization Strategies
This article explains the concept of elastic auto scaling in cloud services, describes typical scenarios such as high‑elastic web apps and compute‑intensive workloads, details the four‑layer architecture and workflow, and outlines functional features, stability improvements, and future optimization directions.
1. Introduction
"Elastic" is a high‑level capability unique to cloud services. Elastic scaling, abbreviated AS (Auto Scaling), allows users to set scaling rules based on business demand and policies, automatically adding virtual resources when demand rises and releasing them when demand falls, thereby balancing cost and performance.
2. Practical Application Scenarios
2.1 High‑elastic Web Application Service
Suitable for workloads with obvious peaks and valleys, such as short‑video services. Users can configure timed policies to achieve stable, periodic scaling, maximizing cost savings.
Figure 1
2.2 High‑availability Compute‑Intensive Service
Applicable to services requiring high availability and dynamic adjustment with load changes, such as distributed big‑data compute nodes. After setting monitoring‑based scaling policies, the system automatically expands or shrinks the cluster based on metrics like CPU usage, and replaces unhealthy instances.
Figure 2
3. Architecture
The overall functional structure of auto scaling consists of four layers: Access, Driver, Adapter, and Operation.
Access Layer : the API side that receives and processes requests.
Driver Layer : the core, linking resource management, scheduling, host management, and workflow via MySQL and Redis; Redis provides distributed locks for scheduled tasks.
Adapter Layer : connects the driver to the operation layer, acting as a third‑party interface platform; it enables support for various cloud services, monitoring platforms (Prometheus, Open‑Falcon), RDS whitelist, and load‑balancer configuration.
Operation Layer : the concrete execution target; different adapters correspond to different operation objects, e.g., OpenStack clouds.
Figure 3
3.1 Workflow Mode
During scaling, multiple actions and instances are involved. To avoid interference, a workflow mode is designed where each instance executes actions sequentially; the result of each action (success or failure) determines the next step, ensuring reliable creation or termination.
3.2 Data Associations
Data relationships are illustrated in Figure 4, where stackId serves as a foreign key linking various tables.
Figure 4
4. Functions
Auto scaling supports scaling operations, automatic load‑balancer configuration, RDS whitelist addition, and rolling updates.
4.1 Load‑Balancing / RDS Configuration
When scaling out, newly healthy instances are attached to the LVS load balancer and added to the RDS whitelist; when scaling in, instances are detached from LVS before termination to avoid traffic disruption.
4.2 Scaling Process
The scaling workflow includes instance creation, status check, ping test, load‑balancer configuration, and RDS whitelist addition. Failure at any step triggers instance deletion and retry; repeated failures abort the process and raise alerts.
Figure 5
When scaling in, the instance weight in LVS is reduced, a 30‑second wait ensures in‑flight requests complete, then the instance is detached and destroyed.
5. Stability Optimizations
5.1 Version Iteration
Initially, the system integrated with OpenStack Heat, but due to Heat’s limitations (no explicit shrink, limited interaction), the architecture switched to direct Nova integration for more flexible VM management.
5.2 High‑Availability Solutions
To improve success rates under resource shortage or creation failures, several fallback strategies are offered, including switching between shared and dedicated packages, converting VLAN/VXLAN networks, creating instances in other clusters or regions, and even using alternative virtualization resources such as bare metal or KubeVirt.
5.3 Inspection and Statistics
An automated inspection robot periodically simulates scaling activities, detects anomalies early, and aggregates metrics such as package surplus and scaling peaks to help operators anticipate resource shortages.
6. Outlook
Current scaling uses create/force‑delete; a more efficient approach for OpenStack VMs is to employ shelve/unshelve, which retains the volume while releasing CPU/memory, reducing the time spent on load‑balancer and RDS configuration and accelerating the overall scaling process.
End of article.
360 Smart Cloud
Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.