Backend Development 20 min read

Handling Interface‑Level Failures: Degradation, Circuit Breaking, Rate Limiting, and Queuing

The article explains how interface‑level faults—where the system stays up but business performance degrades—can be mitigated through four core techniques (degradation, circuit breaking, rate limiting, and queuing), detailing their principles, implementation patterns, and practical trade‑offs for backend services.

Top Architect
Top Architect
Top Architect
Handling Interface‑Level Failures: Degradation, Circuit Breaking, Rate Limiting, and Queuing

In real business operation, an interface‑level fault may not crash the system or break the network, but it can severely affect business, such as slow responses, massive timeouts, or error messages like "cannot connect to database".

The root causes are mainly excessive system pressure or overload, often manifested by database slow queries that exhaust server resources, leading to read/write timeouts and intermittent failures.

These causes can be divided into two categories:

Internal reasons : program bugs causing dead loops, an interface triggering slow DB queries, imperfect logic exhausting memory, etc.

External reasons : hacker attacks, promotional spikes or flash sales that multiply traffic, massive requests from third‑party systems, slow third‑party responses, etc.

The core idea for solving interface‑level faults is similar to multi‑region active‑active designs: prioritize core business and prioritize the majority of users . Four common mitigation methods are presented: degradation, circuit breaking, rate limiting, and queuing.

1. Degradation

Degradation means reducing or disabling certain functions of a service or interface, e.g., a forum may only allow reading posts, or an app may temporarily stop log uploading.

The principle is "lose a horse to save the king": keep core functionality alive while shedding non‑essential features.

1.1 System‑backdoor degradation

A simple approach is to expose a special URL that triggers degradation commands, possibly protected by a password. It is cheap to implement but hard to operate at scale because each server must be accessed individually.

1.2 Independent degradation system

To overcome the back‑door’s limitations, an independent degradation platform can be built to manage permissions, batch operations, and complex rules. The basic architecture is shown below:

2. Circuit Breaking

Circuit breaking stops calls to an external interface when that interface becomes slow or unresponsive, preventing the downstream failure from dragging the whole system down.

Example: Service A depends on Service B; if B’s response time spikes, A’s calls to B are immediately failed, protecting A from overload.

Key points:

A unified API call layer is needed to collect metrics and decide when to trip the breaker.

Threshold design (e.g., >30% requests slower than 1 s in a minute) determines when the circuit opens.

3. Rate Limiting

Rate limiting controls the amount of traffic the system can accept, discarding requests that exceed the capacity. It can be classified into request‑based limiting and resource‑based limiting.

3.1 Request‑based limiting

Two common strategies:

Total quota : limit the total number of users or items (e.g., a live stream caps at 1 million viewers).

Time‑window quota : limit the number of requests per minute or per second.

Choosing the right threshold is difficult; performance testing and iterative tuning are typical approaches.

3.2 Resource‑based limiting

Limits are set on internal resources such as connection count, file handles, thread count, or request queue length. For example, a Netty server may cap its inbound queue at 10 000 entries and reject further requests when full.

3.3 Limiting algorithms

Common algorithms include:

Fixed window : count requests in fixed intervals; simple but suffers from boundary spikes.

Sliding window : overlapping windows reduce boundary effects.

Leaky bucket : requests enter a bucket (queue) and are drained at a constant rate; excess requests are dropped.

Token bucket : tokens are added at a controlled rate; a request proceeds only if a token is available, allowing short bursts while enforcing an average rate.

Leaky bucket is suited for pure burst control, while token bucket is better for throttling downstream services (e.g., limiting calls to a bank API).

4. Queuing

Queuing is a variant of rate limiting where excess requests are placed in a waiting line instead of being rejected. It typically requires an external system (e.g., Kafka) to buffer large volumes of requests.

Typical architecture (e.g., a Double‑11 flash‑sale queue) consists of:

Queue module : receives user purchase requests and stores them FIFO per product. Scheduler module : monitors service capacity and pulls requests from the queue when resources are free. Service module : processes the actual business logic and writes back results.

Conclusion

The article presented four strategies for handling interface‑level failures—degradation, circuit breaking, rate limiting, and queuing—each with its own use cases, advantages, and trade‑offs, helping backend engineers design more resilient systems.

backendfault toleranceRate Limitingcircuit-breakerdegradationqueueing
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.