Design and Implementation of a Flexible Service Governance and Traffic Distribution Engine at Dada Group
This article describes how Dada Group tackled rapid micro‑service growth by designing a language‑agnostic, lightweight service governance and traffic routing engine that uses Consul metadata, configurable routing rules, and link‑based isolation to enable online pressure testing, parallel development, hot‑cold service separation, gray releases, and dynamic rate limiting.
Background: Dada Group (NASDAQ: DADA) experienced explosive growth in its micro‑service count and cloud host numbers, expanding from a single monolithic service to hundreds of services running on thousands of machines over six years.
Challenges: The company faced several pain points, including the inability to isolate online pressure‑test traffic from production traffic, lack of parallel testing for different service versions, difficulty separating hot and cold traffic for core services, limited control over gray‑release routing, and insufficient mechanisms for protecting core services from overload caused by non‑core projects.
Requirements and Design Principles: To address these issues, the team set three main principles: the engine should solve current problems while being extensible to future similar scenarios, be language‑agnostic to fit Dada’s diverse tech stack, and remain lightweight with minimal external dependencies.
Architecture Overview: Building on an existing Consul‑based service discovery framework, the new solution moves all traffic‑control logic to the client side, using Consul only for health checks and metadata storage. Configuration‑center stored routing policies and metadata, enabling real‑time updates without server changes.
Metadata Model (A): Two extensions were added to the service registration model – “link” (a virtual isolated environment composed of specific service instances) and “service instance grouping” (to separate hot and cold instances). Links can be “strong” (no traffic crossing) or “weak” (allow crossing), supporting scenarios such as production‑vs‑pressure‑test isolation, parallel branch testing, and gray releases.
Routing Rules and Logic (B): Routing rules are stored as JSON in the configuration center and edited via an online tool. The client evaluates rules based on the caller’s link, service name, IP, and request interface, then selects the appropriate target instance.
Real‑World Application: The solution successfully isolated pressure‑test traffic from production, enabled parallel development testing, and facilitated gray releases. It reduced the effort for online pressure testing from 250 person‑days in 2018 to 70 person‑days in 2019, a 350% efficiency gain.
Conclusion: By introducing a simple yet extensible data model and lightweight routing logic, Dada’s system solved a wide range of traffic‑control problems, saved significant technical costs, improved productivity, and mitigated potential incidents.
Team Members: bowl‑gu, doubleMing, and superbool – all architects at Dada Group responsible for micro‑service governance, high‑availability data sources, and monitoring platforms.
Dada Group Technology
Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.