How We Automated Cloud Operations: Real-World AWS Scaling & Deployment Cases
This article shares two real‑world operations‑automation case studies—cloud‑based scaling on AWS with Chef and Jenkins, and a comprehensive automated deployment workflow for an advertising company's C++ RTB system, Java platform, and data clusters—highlighting processes, tools, and key lessons.
Case 1: Cloud‑Based Operations Automation
We are a small company running services on AWS, mainly Ruby on Rails, with horizontal scaling.
Initially a single EC2 instance; as the project grew we introduced parallel scaling and SOA, using Chef for provisioning, CloudWatch for monitoring, and AMI creation on each deploy. When load reaches a threshold Chef launches a new instance from the latest AMI and adds it to ELB, achieving application‑level parallel expansion.
Database uses PostgreSQL cluster (one master, multiple slaves) with AWS multi‑AZ for dynamic slave addition and load balancing; Redis follows the same pattern.
Jenkins handles CI; each run builds a Docker image for the staging environment, which currently runs Docker but is not yet deployed to production.
Case 2: Automated Deployment in an Advertising Company
We describe several aspects of our automation.
Compilation
Our RTB system is C++ on Linux, requiring specific runtime libraries. Compilation and deployment are separated: after code is tested, the executable is placed in a designated location, and Jenkins triggers a pre‑tested deployment script, ensuring only verified binaries run. Monitoring scripts periodically check port availability and process health.
Business Platform
The platform is Java‑based, built with Maven. After testing, the specific SVN version is submitted to the system team; Jenkins pulls the code, runs Maven to compile, deploy, and start the service on the target server.
Data Layer
We use Redis and Tair clusters to store user attributes and cookie mappings. Jenkins deploys these clusters; daily profile data is imported automatically, while data migration is manually triggered when node issues are detected by external monitoring, typically during off‑peak hours.
Process Planning
Manual deployment scripts are first created after development; Jenkins automation scripts are derived from these manual scripts.
Auto‑Scale
The cluster is auto‑scaled: a baseline number of machines runs the services, and additional standby machines are quickly deployed during traffic spikes, with traffic redirected to the new instances.
Scale
Currently the RTB system runs on over 40 high‑spec servers (≈20 processes each). The business platform, click collection, and billing systems use about 20 servers, while the Hadoop‑based logging and profiling components run on more than 50 machines.
Key Takeaways
Automation frees people to focus on higher‑value work; without skill upgrades, operations staff risk obsolescence.
Automation is ultimately about people.
Based on actual conditions, define complete processes, tool‑ify repetitive tasks, document them, and aim for minimal human intervention; technically competent staff can follow documented procedures.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.