Operations 7 min read

What Ancient Medicine Teaches About Modern IT Risk Management

Using the classic tale of Bian Que, this article explains how proactive, mid‑stage, and reactive risk controls in IT operations prevent small issues from becoming catastrophic failures, illustrated with real‑world storage, cloud, and equipment‑selection case studies.

Efficient Ops
Efficient Ops
Efficient Ops
What Ancient Medicine Teaches About Modern IT Risk Management

Introduction

The well‑known story of Bian Que and Duke Cai is used as an analogy for risk‑management failure in IT operations. An engineer reports a system risk early, but the company ignores it; later the risk grows, still ignored, leading to a system collapse after the engineer leaves.

If you worry about a situation, it becomes more likely to happen.

In reality, Murphy’s law applies: things go wrong, devices fail, lines break, and people make mistakes. Operations engineers deal with these imminent errors daily.

Ensuring system availability is the core responsibility of operations. While rapid incident response showcases engineering skill, business impact is already incurred regardless of speed.

Strengthening only the fast‑response phase cannot achieve the broader goals set for operations.

Extended Bian Que Story

Another dialogue: King Wei asks Bian Que which of his three brothers is the best doctor. Bian Que says the eldest treats disease before it appears, the second treats it at onset, and he himself treats it at severe stage, each gaining different reputations.

Treating Before Illness (Pre‑emptive Control)

In operations, post‑incident control is weaker than mid‑stage control, which is weaker than pre‑emptive control. Post‑incident control = fault handling; mid‑stage = architecture and process design; pre‑emptive = company policies and principles. All three layers are essential and not interchangeable.

Case 1: Dimensions and Stages of Risk Control

High‑end storage designs keep spare disks as hot backups. When a disk fails, the system automatically switches to the spare, addressing risk at the technical/architecture level.

However, if operational processes are lax and failed disks are not replaced, the hot spares are exhausted, leading to severe business interruption. This shows that risk control at different dimensions and stages cannot replace each other.

Case 2: Effective Pre‑emptive Control

In the public IaaS cloud, over‑commitment (overselling) is common. When platform load spikes, customers experience performance degradation, which cannot be solved by mid‑stage or post‑incident measures.

Capital Online set a policy of “no overselling” at the principle level, resulting in minimal performance‑related incidents over years—an example of successful pre‑emptive control.

Case 3: Hidden Costs in Risk Control

Choosing equipment solely on price leads to a fragmented vendor landscape and subtle compatibility issues that surface only under failure, causing major business impact.

If not addressed pre‑emptively, these issues increase hidden costs in staff expertise, equipment maintenance, and automation.

Conclusion

Risk management consists of risk identification, control, and response, with control being the most challenging part.

Returning to the Bian Que analogy, the ruler’s attitude—"I have no illness; treat the healthy"—mirrors how senior management’s understanding of operations determines how far a company can go.

case studyrisk managementoperationsIT Infrastructurepreventive control
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.