Implementing a Coloring Environment for Test Environment Stability
DeWu solved chronic test‑environment instability by evolving from isolated ECS machines to container clusters and finally to a traffic‑tagging “coloring” environment, where an x‑infr‑flowtype header routes requests to dedicated coloring nodes, achieving over 95% demand coverage, reduced conflict, lower costs, and a roadmap toward production gray‑release.
Test environment stability is a critical issue for many companies because instability directly impacts development iteration and testing efficiency.
The main causes of instability are non‑final changes (code or configuration releases that break services), frequent changes that make permission control difficult, and parallel demands that lead to resource contention.
DeWu (得物) has gone through three stages to address these problems:
2020‑2021: Physical isolation based on ECS. Three isolated physical test environments (T0, T1, T2) were created, but heavy parallel testing caused frequent conflicts and no single stable environment.
2021‑2022: MF full‑link container environment. Ten container‑based MF environments were built on top of T0, sharing the DB with T0 while keeping other resources independent. This reduced conflicts but introduced high maintenance costs and residual stability issues.
2022: Coloring environment (traffic isolation). Traffic is split by a coloring tag, allowing parallel tests to run without affecting each other while keeping a single baseline environment.
Basic idea : A traffic tag (x‑infr‑flowtype) is added to the HTTP header and propagated downstream via OpenTracing baggage. Services read the tag, register themselves in a special "coloring arena" in the service registry, and route requests to the appropriate coloring node. If a coloring node is missing, the request falls back to the baseline environment.
Implementation details:
Flow tag propagation: x-infr-flowtype:<CE_ColoringEnv> ##CE_是固定前缀,为了和压测标做区分
Service registration: each service instance adds the COLORING_ENV environment variable and registers a node named CE_<ServiceName> in the registry.
MQ handling: messages carry a DMQ_ENV_TAG field. Consumers check the tag against their local COLORING_ENV and only process matching messages; otherwise they ACK without business logic.
The solution also defines how different traffic entry points (App, Web, Feishu callbacks, Job tasks, Canal subscriptions) attach the coloring tag, typically by adding the x-infr-flowtype header or URL parameter.
Implementation roadmap includes project initiation, middleware refactoring, gradual gray‑release, independent project adoption, and full‑business rollout, spanning from April to November.
Business impact : Over 95% of demands now use the coloring environment for testing, with significant cost savings compared to maintaining multiple isolated physical environments. Remaining challenges involve data isolation and front‑end coloring.
Conclusion : The coloring environment effectively solves test environment conflicts and stability issues while reducing costs, and it is expected to be extended to production gray‑release scenarios.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.