Operations 12 min read

Five Patterns to Make Your Microservice Fault‑Tolerant

This article explains essential fault‑tolerance patterns for microservices—including timeouts, retries, circuit breakers, distributed deadlines, and rate limiting—detailing their basic forms, drawbacks, and practical implementation strategies to improve reliability and prevent cascading failures.

Architects Research Society

Jul 13, 2023

Five Patterns to Make Your Microservice Fault‑Tolerant

In this article we introduce fault tolerance in microservices, defining it as the ability of a system to continue operating when some components fail.

Timeouts

Timeouts specify a maximum waiting period for an event. The article discusses the shortcomings of socket‑level SO_TIMEOUT and recommends using end‑to‑end request timeouts, with examples such as JDK 11, OkHttp, and Go’s standard library.

Retries

Retries are useful when transient failures occur. The article warns about retry storms in a chain of services and suggests distinguishing retryable from non‑retryable errors and using an error budget to limit retries.

Circuit Breaker

Circuit breakers act as a stricter form of error budgeting: when the error rate exceeds a threshold, calls are short‑circuited and a fallback is returned. Hystrix and its successor resilience4j are mentioned.

Deadlines / Distributed Timeouts

Distributed deadlines propagate a deadline timestamp or remaining timeout through downstream services, allowing each service to stop processing when the overall deadline is reached. The article explains how to calculate remaining time and the challenges of clock skew.

Rate Limiter

Rate limiting protects services from overload by limiting inbound requests (rate) or concurrent executions. Both static and dynamic limiters are described; dynamic limiters adjust limits based on metrics such as latency percentiles using an AIMD algorithm.

if healthy {<br/>    limit = limit + increase;<br/>} else {<br/>    limit = limit * decreaseRatio; // 0 < decreaseRatio < 1.0<br/>}

The article concludes that applying these patterns together with good observability can greatly improve service reliability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Microservices fault tolerance Rate Limiting Circuit Breaker timeouts retries

Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.