IP‑Tag Based Traffic Routing and Distributed Tracing System for Test Environments
This article describes how a microservice architecture uses IP‑tag traffic routing to separate dynamic test environments from stable ones, implements RPC and MQ routing, and introduces a custom Zipkin‑based distributed tracing system called Tianwang to improve deployment efficiency, resource utilization, and debugging in large‑scale backend services.
Background – Parallel development leads to multiple test branches being deployed simultaneously, making test‑environment service governance far more complex than production. Traditional isolated test clusters cause low deployment efficiency, resource waste, and management overhead as microservice scale grows.
Solution Overview – The architecture introduces IP‑tag traffic routing: only the modified service set X is deployed in a dynamic test environment, while all other services run in a shared stable environment. Requests involving X are automatically routed to the dynamic environment, minimizing deployment size and conserving resources.
RPC and MQ Routing – RPC routing replaces host‑file mapping with a service management platform that prefers same‑machine instances based on SRC_IP. MQ routing adds IP prefixes to topics (or groups) so that producers and consumers stay within the dynamic environment when appropriate.
Dynamic vs. Stable Environments – Environments are distinguished by tags in /opt/system.env : testserver (dynamic) and teststable (stable). Dynamic environments host the entry point and modified services; stable environments provide common services identical to production.
Distributed Tracing – Tianwang – To address cross‑environment debugging, a customized tracing system built on Zipkin (named Tianwang) is deployed. It samples 100% of test traffic, records <TraceId, SpanId, ParentSpanId> , and captures request/response metadata, enabling full‑stack visibility.
Implementation Details – Entry' receives requests, stores its IP in thread‑local storage, and propagates it to downstream services. Both RPC and MQ frameworks use this SRC_IP to decide whether to stay within the dynamic environment or fall back to the stable one. The system also integrates transmittable‑thread‑local for cross‑thread propagation.
Results – After switching to traffic routing in December 2020, average services per dynamic environment dropped from ~30 to ~10, and average memory allocation fell from 16 GB to under 12 GB, despite a 50% increase in total services. The approach achieved significant resource savings and faster deployments.
Conclusion – IP‑tag traffic routing combined with the Tianwang tracing system provides a scalable, efficient solution for test‑environment service governance, reducing resource consumption while maintaining full observability, and sets the foundation for future enhancements.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.