Automating Patch Management for 10,000+ Servers with StackStorm & SaltStack
This article explains how Ctrip’s senior engineer Hu Junya built an automated operations platform using SaltStack for remote control, StackStorm for workflow orchestration, and a custom Jobs tool for batch gray releases, enabling safe, scalable patch deployment across thousands of servers.
Speaker Hu Junya, senior technical support engineer at Ctrip, responsible for SaltStack, StackStorm and other operations platforms.
Topic: an automation platform based on StackStorm for Ctrip’s operations.
The 2021 ransomware outbreak highlighted the need for rapid, automated patching to protect business services.
When a company manages thousands of servers, manual patch updates are impossible; a coordinated, low‑impact, automated approach is required.
Typical single‑server patch flow: check if the patch is installed, if not pull the server out of the production cluster, install the patch, reboot, and optionally warm‑up the application before re‑joining the cluster.
Automation must address two aspects: (1) orchestrate the entire workflow, and (2) perform remote operations without logging into each server.
Remote Control
SaltStack is an open‑source remote management platform with a master‑minion architecture. The master issues tasks; minions execute them and return results. Similar tools include Ansible, Chef, and Puppet.
Workflow Orchestration
Traditional operations rely on manual steps or ad‑hoc scripts, leading to low efficiency, error‑prone processes, duplicated code, and missing audit logs.
DevOps introduces a plethora of open‑source tools and custom utilities, but challenges remain: complex changes require many tools, duplicated effort, unclear logic, and lack of unified logs.
StackStorm, an event‑driven automation platform, solves these problems by turning tool APIs into actions, composing actions into workflows, providing a visual UI, and centralizing logs.
Users can trigger actions via the web UI or API, and integrate with ChatOps for rapid incident response.
Batch Gray Release
For operations affecting tens of thousands of servers, the custom Jobs tool adds batch‑gray capabilities on top of StackStorm.
Automatically split targets into configurable batches (e.g., 1%, 5%, 10%).
Delegate actual execution to StackStorm actions, keeping Jobs lightweight.
Collect and display success/failure statistics for each batch.
Conclusion
To build a robust automation platform: deploy a remote‑management framework (e.g., SaltStack or Ansible), implement atomic operations as StackStorm actions, compose them into workflows, and use a batch‑gray system like Jobs for large‑scale tasks.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.