How LLMs Transform Traffic Replay Testing for Backend Services
This article walks through the challenges of traditional traffic replay, explains the design and benefits of a conventional replay system, and then details how integrating large language models can automate data preparation, script generation, and validation to make backend testing more accurate, scalable, and efficient.
Introduction
Many engineers have heard of traffic replay but find it harder to implement than most engineering tasks because it heavily depends on internal backend services, environment conditions, and architecture.
Why Traffic Replay Is Needed
It provides three main benefits: ensuring functional correctness of APIs with real user traffic, offering diverse data for test script creation, and enabling large‑scale regression testing when services change.
Key Challenges
Authenticity : User‑agent diversity, user‑profile complexity, and identifying hot interfaces are difficult to simulate with synthetic data.
Data Reference : Real traffic supplies varied request parameters and headers for manual script authoring.
Scalability : Service merges, migrations, technology upgrades, and database or message‑queue changes require massive regression effort.
Testing Confidence : Over‑strict or over‑lenient validation leads to false positives or missed bugs, reducing trust in automation.
Traditional Traffic Replay System
Our team built a conventional system that records traffic via Nginx, streams logs to Kafka, stores them in a database, and flattens JSON responses into a one‑dimensional
response_shapefor aggregation and comparison.
<code>{
"host": "xxxx",
"request_path": "/a/b/c",
"request_headers": [...],
"request_params": [...],
"request_method": "POST",
"response_shape": ["data.user", "data.name", "data.age"]
}</code>The replay process consists of three steps: test case collection, deduplication of response shapes, request headers, and request parameters, and finally executing the requests with
celeryand comparing results.
LLM‑Powered Traffic Replay System
To overcome the limitations of the traditional approach, we introduced large language models (LLMs) to automate data preparation, script generation, and validation.
Data Preparation : Every night the platform aggregates traffic, extracts unique response shapes from Elasticsearch, joins them with request data in MySQL, deduplicates tokens, headers, and parameters, and randomly supplements a small number of records (5‑6) to keep the dataset manageable for the LLM.
Data Storage : The prepared JSON structures are fed into a DIFY workflow that decides whether a new script is needed based on previous versions.
Script Generation : The LLM produces Python test scripts from the structured data; if a script already exists, the model evaluates whether an update is required.
Execution & Analysis : Generated scripts run in the backend, logs are collected, and another LLM analyzes any errors to produce readable alerts.
Images illustrate the service call hierarchy, the traditional replay pipeline, deduplication logic, and the AI workflow.
Future Plans
We have integrated the system into our DevOps pipeline, recorded 257 interfaces, generated 583 scripts, and plan to reduce manual review time, close the three‑day data gap, and improve stability for non‑idempotent services.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.