Cloud Computing 12 min read

Auto Scaling (AS) in Cloud Services: Architecture, Use Cases, and Optimization Strategies

This article explains the concept of elastic auto scaling in cloud services, describes typical scenarios such as high‑elastic web apps and compute‑intensive workloads, details the four‑layer architecture and workflow, and outlines functional features, stability improvements, and future optimization directions.

360 Smart Cloud
360 Smart Cloud
360 Smart Cloud
Auto Scaling (AS) in Cloud Services: Architecture, Use Cases, and Optimization Strategies

1. Introduction

"Elastic" is a high‑level capability unique to cloud services. Elastic scaling, abbreviated AS (Auto Scaling), allows users to set scaling rules based on business demand and policies, automatically adding virtual resources when demand rises and releasing them when demand falls, thereby balancing cost and performance.

2. Practical Application Scenarios

2.1 High‑elastic Web Application Service

Suitable for workloads with obvious peaks and valleys, such as short‑video services. Users can configure timed policies to achieve stable, periodic scaling, maximizing cost savings.

Figure 1

2.2 High‑availability Compute‑Intensive Service

Applicable to services requiring high availability and dynamic adjustment with load changes, such as distributed big‑data compute nodes. After setting monitoring‑based scaling policies, the system automatically expands or shrinks the cluster based on metrics like CPU usage, and replaces unhealthy instances.

Figure 2

3. Architecture

The overall functional structure of auto scaling consists of four layers: Access, Driver, Adapter, and Operation.

Access Layer : the API side that receives and processes requests.

Driver Layer : the core, linking resource management, scheduling, host management, and workflow via MySQL and Redis; Redis provides distributed locks for scheduled tasks.

Adapter Layer : connects the driver to the operation layer, acting as a third‑party interface platform; it enables support for various cloud services, monitoring platforms (Prometheus, Open‑Falcon), RDS whitelist, and load‑balancer configuration.

Operation Layer : the concrete execution target; different adapters correspond to different operation objects, e.g., OpenStack clouds.

Figure 3

3.1 Workflow Mode

During scaling, multiple actions and instances are involved. To avoid interference, a workflow mode is designed where each instance executes actions sequentially; the result of each action (success or failure) determines the next step, ensuring reliable creation or termination.

3.2 Data Associations

Data relationships are illustrated in Figure 4, where stackId serves as a foreign key linking various tables.

Figure 4

4. Functions

Auto scaling supports scaling operations, automatic load‑balancer configuration, RDS whitelist addition, and rolling updates.

4.1 Load‑Balancing / RDS Configuration

When scaling out, newly healthy instances are attached to the LVS load balancer and added to the RDS whitelist; when scaling in, instances are detached from LVS before termination to avoid traffic disruption.

4.2 Scaling Process

The scaling workflow includes instance creation, status check, ping test, load‑balancer configuration, and RDS whitelist addition. Failure at any step triggers instance deletion and retry; repeated failures abort the process and raise alerts.

Figure 5

When scaling in, the instance weight in LVS is reduced, a 30‑second wait ensures in‑flight requests complete, then the instance is detached and destroyed.

5. Stability Optimizations

5.1 Version Iteration

Initially, the system integrated with OpenStack Heat, but due to Heat’s limitations (no explicit shrink, limited interaction), the architecture switched to direct Nova integration for more flexible VM management.

5.2 High‑Availability Solutions

To improve success rates under resource shortage or creation failures, several fallback strategies are offered, including switching between shared and dedicated packages, converting VLAN/VXLAN networks, creating instances in other clusters or regions, and even using alternative virtualization resources such as bare metal or KubeVirt.

5.3 Inspection and Statistics

An automated inspection robot periodically simulates scaling activities, detects anomalies early, and aggregates metrics such as package surplus and scaling peaks to help operators anticipate resource shortages.

6. Outlook

Current scaling uses create/force‑delete; a more efficient approach for OpenStack VMs is to employ shelve/unshelve, which retains the volume while releasing CPU/memory, reducing the time spent on load‑balancer and RDS configuration and accelerating the overall scaling process.

End of article.

cloud computingHigh AvailabilityLoad Balancingauto scalingelasticity
360 Smart Cloud
Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.