Backend Development 20 min read

Design and Go Implementation of a Service Circuit Breaker

This article explains the design and Go implementation of a microservice circuit breaker, covering fault‑tolerance mechanisms, state transitions, configurable trip strategies, metrics collection, testing, and deployment patterns such as centralized gateways and service mesh.

High Availability Architecture

Dec 31, 2020

Design and Go Implementation of a Service Circuit Breaker

He Peng, currently working at an internet finance company, focuses on architecture and development management, especially in distributed systems and risk control.

I. Summary

In microservice architectures, service timeouts or communication failures often lead to cascading failures (avalanche effect). Rate limiting and circuit breaking are essential solutions. A previous article discussed various rate‑limiting implementations.

II. Microservice Fault‑Tolerance Mechanism

Microservice dependencies can cause complex failure cascades. When Service C fails, Service B may repeatedly retry, exhausting resources and causing Service A to become unavailable—a classic avalanche scenario.

To prevent this, a robust fault‑tolerance mechanism is needed: redundancy through clustering, load balancing, and retry strategies.

Failover – redirect to a healthy instance.

Failback – notify of failure.

Failsafe – ensure safe degradation.

Failfast – abort quickly on error.

Besides clustering, both circuit breaking and rate limiting are required. Rate limiting protects upstream services from overload, while circuit breaking blocks calls to a downstream service that is failing.

III. Circuit Breaker Design and Implementation

Design Idea

The circuit breaker concept originates from electrical fuses and has been applied to financial markets. In microservices, the idea is similar: automatically stop calls when a service is unhealthy.

type ServiceBreaker struct {
    mu               sync.RWMutex
    name             string
    state            State
    windowInterval   time.Duration
    metrics          Metrics
    tripStrategyFunc TripStrategyFunc
    halfMaxCalls     uint64
    stateOpenTime    time.Time
    sleepTimeout     time.Duration
    stateChangeHook  func(name string, fromState State, toState State)
}

The struct fields include a read‑write lock, name, current state, window interval, metrics, a configurable trip strategy, half‑open call limit, timestamps, and an optional state‑change hook.

type State int
const (
    StateClosed State = iota
    StateOpen
    StateHalfOpen
)

func (s State) String() string {
    switch s {
    case StateClosed:
        return "closed"
    case StateHalfOpen:
        return "half-open"
    case StateOpen:
        return "open"
    default:
        return fmt.Sprintf("unknown state: %d", s)
    }
}

The Call method wraps the execution with beforeCall and afterCall hooks, handling panics and updating metrics.

func (breaker *ServiceBreaker) Call(exec func() (interface{}, error)) (interface{}, error) {
    err := breaker.beforeCall()
    if err != nil {
        return nil, err
    }
    defer func() {
        if r := recover(); r != nil {
            breaker.afterCall(false)
            panic(r)
        }
    }()
    breaker.metrics.OnCall()
    result, err := exec()
    breaker.afterCall(err == nil)
    return result, err
}

Before Call Check

func (breaker *ServiceBreaker) beforeCall() error {
    breaker.mu.Lock()
    defer breaker.mu.Unlock()
    now := time.Now()
    switch breaker.state {
    case StateOpen:
        if breaker.stateOpenTime.Add(breaker.sleepTimeout).Before(now) {
            log.Printf("%s cooldown passed, trying half‑open", breaker.name)
            breaker.changeState(StateHalfOpen, now)
            return nil
        }
        log.Printf("%s is open, request blocked", breaker.name)
        return ErrStateOpen
    case StateHalfOpen:
        if breaker.metrics.CountAll >= breaker.halfMaxCalls {
            log.Printf("%s half‑open, too many calls blocked", breaker.name)
            return ErrTooManyCalls
        }
    default: // Closed
        if !breaker.metrics.WindowTimeStart.IsZero() && breaker.metrics.WindowTimeStart.Before(now) {
            breaker.nextWindow(now)
            return nil
        }
    }
    return nil
}

The method decides whether a request can proceed based on the current state and configured limits.

After Call Processing

func (breaker *ServiceBreaker) afterCall(success bool) {
    breaker.mu.Lock()
    defer breaker.mu.Unlock()
    if success {
        breaker.onSuccess(time.Now())
    } else {
        breaker.onFail(time.Now())
    }
}

Success updates success counters; failure triggers state transitions according to the trip strategy.

Metrics and Sliding Window

type Metrics struct {
    WindowBatch        uint64
    WindowTimeStart    time.Time
    CountAll           uint64
    CountSuccess       uint64
    CountFail          uint64
    ConsecutiveSuccess uint64
    ConsecutiveFail    uint64
}

func (m *Metrics) NewBatch() { m.WindowBatch++ }
func (m *Metrics) OnCall() { m.CountAll++ }
func (m *Metrics) OnSuccess() { m.CountSuccess++; m.ConsecutiveSuccess++; m.ConsecutiveFail = 0 }
func (m *Metrics) OnFail() { m.CountFail++; m.ConsecutiveFail++; m.ConsecutiveSuccess = 0 }
func (m *Metrics) OnReset() { m.CountAll, m.CountSuccess, m.CountFail = 0, 0, 0; m.ConsecutiveSuccess, m.ConsecutiveFail = 0, 0 }

The sliding window groups metrics into batches; a new window resets counters and sets the next window start time based on the breaker state.

func (breaker *ServiceBreaker) nextWindow(now time.Time) {
    breaker.metrics.NewBatch()
    breaker.metrics.OnReset()
    var zero time.Time
    switch breaker.state {
    case StateClosed:
        if breaker.windowInterval == 0 {
            breaker.metrics.WindowTimeStart = zero
        } else {
            breaker.metrics.WindowTimeStart = now.Add(breaker.windowInterval)
        }
    case StateOpen:
        breaker.metrics.WindowTimeStart = now.Add(breaker.sleepTimeout)
    default: // HalfOpen
        breaker.metrics.WindowTimeStart = zero
    }
}

State Transition Logic

func (breaker *ServiceBreaker) changeState(state State, now time.Time) {
    if breaker.state == state {
        return
    }
    prev := breaker.state
    breaker.state = state
    breaker.nextWindow(time.Now())
    if state == StateOpen {
        breaker.stateOpenTime = now
    }
    if breaker.stateChangeHook != nil {
        breaker.stateChangeHook(breaker.name, prev, state)
    }
}

When a state changes, a new metrics window is started and an optional hook is invoked.

Trip Strategies

type TripStrategyFunc func(Metrics) bool

func ConsecutiveFailTripFunc(threshold uint64) TripStrategyFunc {
    return func(m Metrics) bool { return m.ConsecutiveFail >= threshold }
}

func FailTripFunc(threshold uint64) TripStrategyFunc {
    return func(m Metrics) bool { return m.CountFail >= threshold }
}

func FailRateTripFunc(rate float64, minCalls uint64) TripStrategyFunc {
    return func(m Metrics) bool {
        if m.CountAll == 0 {
            return false
        }
        currRate := float64(m.CountFail) / float64(m.CountAll)
        return m.CountAll >= minCalls && currRate >= rate
    }
}

func ChooseTrip(op *TripStrategyOption) TripStrategyFunc {
    switch op.Strategy {
    case ConsecutiveFailTrip:
        return ConsecutiveFailTripFunc(op.ConsecutiveFailThreshold)
    case FailTrip:
        return FailTripFunc(op.FailThreshold)
    case FailRateTrip:
        fallthrough
    default:
        return FailRateTripFunc(op.FailRate, op.MinCall)
    }
}

Three strategies are supported: consecutive failures, total failures, and failure‑rate with a minimum call threshold.

Configuration Options

type TripStrategyOption struct {
    Strategy                uint
    ConsecutiveFailThreshold uint64
    FailThreshold           uint64
    FailRate                float64
    MinCall                 uint64
}

type Option struct {
    Name            string
    WindowInterval  time.Duration
    HalfMaxCalls    uint64
    SleepTimeout    time.Duration
    StateChangeHook func(name string, fromState State, toState State)
    TripStrategy    TripStrategyOption
}

These options allow fine‑grained control over window size, half‑open call limits, cooldown periods, and the chosen trip strategy.

Testing the Breaker

func initBreaker() *ServiceBreaker {
    tripOp := TripStrategyOption{Strategy: FailRateTrip, FailRate: 0.6, MinCall: 3}
    opt := Option{Name: "breaker1", WindowInterval: 5*time.Second, HalfMaxCalls: 3, SleepTimeout: 6*time.Second, TripStrategy: tripOp, StateChangeHook: stateChangeHook}
    breaker, _ := NewServiceBreaker(opt)
    return breaker
}

Unit tests simulate successful calls, a burst of failures, and recovery, both sequentially and with five concurrent goroutines, demonstrating state transitions and the effect of the configured thresholds.

Deployment Patterns

The article discusses three service‑call patterns:

Direct calls between services.

Centralized gateway (proxy) where all traffic passes through a gateway that can enforce rate limiting and circuit breaking.

Service‑mesh (side‑car) architecture, where a lightweight proxy runs alongside each service instance, providing transparent fault tolerance without code intrusion.

Both gateway‑based and mesh‑based approaches can offload metrics collection and decision making to asynchronous components, reducing latency impact on the request path.

Conclusion

The circuit breaker design presented combines state management, configurable trip strategies, and metrics windows to protect microservices from cascading failures. The implementation is available at https://github.com/skyhackvip/service_breaker .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Design Patterns microservices Golang Circuit Breaker

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.