Operations 10 min read

Foundations of High Availability: Defining and Managing Strong and Weak Service Dependencies

The article defines strong versus weak service dependencies, outlines governance through discovery, fault injection, and refactoring, recommends front‑end and back‑end fault‑tolerance measures such as timeouts and circuit breakers, describes isolation and artificial degradation switches, verifies classifications, and notes current middleware gaps and hiring information.

HelloTech

Jul 30, 2021

Foundations of High Availability: Defining and Managing Strong and Weak Service Dependencies

1. Definition of Strong and Weak Dependencies

As the company’s business expands, the system becomes increasingly complex, with front‑end reliance on back‑end services and inter‑service dependencies. Without clear strong/weak dependency definitions, it is difficult to perform circuit breaking, degradation, or rate limiting, and to continuously improve system stability.

S1 Core S2 Secondary Core S3 Non‑core S4 Others S1: Affects core business processes and user experience. S2: Not core but service outage causes widespread user impact. S3: Non‑core, negligible user impact (e.g., avatar, profile edit). S4: Almost no impact on online services (e.g., internal operation back‑ends).

Strong dependency: when an exception affects core business processes or system availability. Weak dependency: when an exception does not affect core processes or overall availability.

2. Governance of Strong and Weak Dependencies

Governance means continuously obtaining dependency relationships, traffic, and strength data, detecting potential failure points early, and preventing dependency‑related incidents from degrading user experience.

2.1 Discovery

2.1.1 Manual梳理 (Manual Review)

Initially, a large amount of manpower was invested to read code and list all dependencies in the core ride‑service chain.

Identify the main business and evaluate whether each dependent service impacts it. Example: For the user‑scan QR code flow, many pre‑checks (user eligibility, vehicle status, etc.) are involved.

Through manual analysis, it was found that only a few APIs (create order, start order, end order, query order) are core, while dozens of other services (Redis, DB, MQ, etc.) are weak dependencies that unnecessarily increase the failure surface.

2.1.2 Fault Injection

Service configuration files are used to list dependencies. Fault injection is performed offline by injecting exceptions into each dependency to verify whether the main business remains functional, thereby distinguishing strong from weak dependencies.

We are currently injecting failures into two non‑S1 services to identify strong/weak dependencies and prevent degradation of the core chain.

2.2 Refactoring & Contingency Plans

2.2.1 Front‑end Fault Tolerance

Decouple non‑core backend calls so that failures do not block core flows. Example: The “Confirm Unlock” page fetches pricing rules; if this call is a strong dependency, a failure blocks the button and the core flow. If treated as a weak dependency, the page can still proceed with partial data.

2.2.2 Back‑end Fault Tolerance

For weak dependencies, implement proper timeout, circuit breaking, and rate limiting.

Timeout : Configure based on 95th/99th percentile response times, considering serialization and network latency.

Circuit Breaker : When a service fails, break the circuit to avoid cascading timeouts and provide fallback values.

Rate Limiting : Protect core service interfaces from traffic spikes to maintain stability.

2.3 Isolation of Core and Non‑core Business

Techniques include thread‑level isolation (semaphores, thread pools), process‑level isolation, business splitting, and group deployment.

In the ride‑service case, thread‑level isolation was infeasible due to the SOA framework, and group deployment could not avoid instability caused by code changes. Therefore, business splitting was chosen: core business was extracted from the service while non‑core logic remained.

2.4 Artificial Degradation Switch

For each scenario, a controllable fallback is configured via dynamic switches, allowing a one‑click switch to fallback logic when exceptions occur.

3. Verification

Service configuration files are used again to inject failures offline and verify that the main business remains available, confirming the classification of dependencies.

4. Current Status & Issues

Too much focus on service‑to‑service dependencies, neglecting middleware (Redis/HBase/MQ) dependencies.

Emphasis is placed on runtime, overlooking start‑up and shutdown phases.

Recruitment Information

We are the “Two‑Round Technical Risk” team at HelloBike, focusing on high‑traffic, high‑concurrency system stability. We are hiring; interested candidates can send resumes to [email protected].

The End

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Frontend high availability system reliability Fault Injection Service Dependency

Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.