Backend Development 12 min read

Traffic Replay Testing: Architecture, Implementation, and the Pandora Platform

This article explains the concept, black‑box and white‑box approaches, and the end‑to‑end technical solution of traffic replay testing for microservice back‑ends, detailing recording and playback processes, a Kubernetes‑based distributed execution platform, result calibration, and future enhancements.

Xueersi Online School Tech Team
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Traffic Replay Testing: Architecture, Implementation, and the Pandora Platform

Background

With increasingly fierce internet competition, product iterations become frequent, making regression testing heavy and urgent, which pressures test quality and efficiency. Traditional interface automation testing has high maintenance costs, prompting the need for a reliable, low‑maintenance solution—traffic replay testing.

Testing Approaches

Black‑Box

Copy online requests and responses, recreate the environment offline, replay the requests, and assert that responses match recorded ones. Suitable mainly for GET APIs; testing write APIs incurs extra data cleaning and mapping costs.

White‑Box

Record both inbound requests/responses and outbound service calls, then mock downstream dependencies during replay, allowing focus on the service’s own logic. Tools such as Alibaba’s Doom, jvm‑sandbox, and Didi’s RDebug implement this approach.

Why Traffic Replay Works

In a microservice architecture, if a new service produces identical responses to the old one for all possible consumer calls, functional equivalence is guaranteed. By covering all consumer‑contract calls, the test ensures the service’s correctness without exhaustive API combinatorial testing.

Technical Solution

The overall scheme uses traffic recording and playback. PHP services employ RDebug, while Go services use Sharigan. Recorded traffic is stored in Elasticsearch as sessions.

Recording Process

Requests arrive via Nginx, are forwarded to php‑fpm, which may invoke downstream services (MySQL, Redis, HTTP/RPC). All network interactions are captured by a recorder and saved as a traffic case.

Playback Process

Recorded traffic is replayed by matching inbound requests to the service under test and mocking outbound calls based on recorded responses, then comparing the service’s output with the original.

Impact on Code

Using Didi’s RDebug transport‑layer recording yields zero code intrusion; memory usage roughly doubles but response latency remains unaffected.

Pandora Platform

Pandora adds four key capabilities:

Code‑coverage‑based traffic deduplication to select minimal yet sufficient test cases.

Kubernetes‑distributed jobs for parallel execution, reducing regression time to 6‑20 minutes.

Result calibration feedback loop to classify failures (BUG, playback error, expected new feature, unexpected new feature, defect verification).

Web UI for easy report viewing, automatic CI trigger, and batch calibration.

Coverage and Results

Pandora now covers all PHP 1‑on‑1 projects, supporting over 800 iterations, detecting 14 defects, and handling 50+ daily replay tasks.

Open Issues

Challenges remain for traffic that cannot be captured online, limited Golang support, and lack of full‑link replay.

Future Plans

Implement precise replay based on Git code changes: after a push, identify affected functions, locate corresponding traffic, and replay only the impacted flows.

microserviceskubernetestraffic replaybackend testingcontinuous integrationAPI testing
Xueersi Online School Tech Team
Written by

Xueersi Online School Tech Team

The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.