Why Ops Engineers Are Always the Scapegoat—and How to Turn That Into Value
The article reflects on the challenges faced by operations engineers in small companies, illustrating why they often become scapegoats, and offers practical advice on learning, risk control, communication, and disaster‑recovery drills to increase their value and effectiveness.
Author's Preface
After many years in security I accidentally fell into the operations "pit". While exploring operations work I kept stumbling into problems and climbing out again. Our team dissolved on the first night of the company’s founding and each of us went our separate ways. Growth often requires going through such experiences, so I write this to calm my nerves.
1. What to Write at the Beginning?
Operations, formally called "Reliability Support Engineer" and colloquially "scapegoat", involves many roles in small and medium enterprises. Unlike large BAT‑type companies that have dedicated DBAs, system admins, network engineers, etc., in smaller firms the scapegoat wears multiple hats and must handle a wide range of issues.
2. Why Is the Scapegoat Always Me?
I was once a scapegoat in a small company, handling everything from server procurement and rack mounting to PC OS installation, network wiring, performance testing, and documentation.
Example: a company used a Java stack (Tomcat, MySQL). Developers delivered a WAR without testing, handed it to ops, and after ops changed configuration the upload feature crashed. Ops checked permissions and logs, fed the problem back to developers, who blamed the Tomcat container. Ops eventually discovered a missing component, repackaged, and fixed the issue, but the boss still blamed ops.
This pattern is common in small IT firms; the scapegoat role stems from cleaning up developers' mistakes. To change the boss’s perception, ops must become more valuable by expanding their skill set.
3. Company Needs and Views
What does a company really need from ops? How can ops satisfy both the boss and the team? The answer starts with empathy.
3.1 Learning Ability
Ops must master system installation, scripting (Bash, Python), understand the languages used by developers (PHP, Java, etc.), and have basic security testing knowledge. Being able to write automation scripts, diagnose logs, and respond to attacks is essential.
3.2 Risk Controllability
Risk controllability consists of three aspects: stability, performance, and security.
Stability : Example – a service restarted every three days due to a memory‑leak bug. By fixing the middleware bug and extending the restart interval to weeks, stability improved dramatically.
Performance : Optimizing from five Apache servers to two Nginx servers reduced hardware costs while meeting the same load, showing how performance tuning saves money.
Security : Penetration testing (both white‑box and black‑box) is necessary. Vulnerabilities often stem from developer bugs; ops can help identify and mitigate them, as illustrated by a case where a Java manager’s insecure code allowed root‑level webshell deployment.
3.3 Skills Forced by Pressure
Communication skills are usually forced upon ops. They must interact with managers, developers, bosses, and customers. Claiming “you’re the one who wrote the bug!” or “this isn’t my problem!” is unproductive. Effective communication and emotional intelligence are crucial.
3.4 Daily Drills
Disaster‑recovery drills are as important as regular backups. Examples include simulating disk failures, GitLab deletions, and DDoS attacks. Regular drills improve detection speed, response time, and overall resilience.
4. Conclusion
Every problem has a cause; we must embrace possible issues and prepare adequate responses within our capabilities. Continuous learning expands our skill range, demonstrates ops value, and ultimately turns the scapegoat role into a strategic asset.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.