Backend Development 18 min read

How LLMs Transform Traffic Replay Testing for Backend Services

This article walks through the challenges of traditional traffic replay, explains the design and benefits of a conventional replay system, and then details how integrating large language models can automate data preparation, script generation, and validation to make backend testing more accurate, scalable, and efficient.

Sohu Tech Products

Jun 18, 2025

How LLMs Transform Traffic Replay Testing for Backend Services

Introduction

Many engineers have heard of traffic replay but find it harder to implement than most engineering tasks because it heavily depends on internal backend services, environment conditions, and architecture.

Why Traffic Replay Is Needed

It provides three main benefits: ensuring functional correctness of APIs with real user traffic, offering diverse data for test script creation, and enabling large‑scale regression testing when services change.

Key Challenges

Authenticity : User‑agent diversity, user‑profile complexity, and identifying hot interfaces are difficult to simulate with synthetic data.

Data Reference : Real traffic supplies varied request parameters and headers for manual script authoring.

Scalability : Service merges, migrations, technology upgrades, and database or message‑queue changes require massive regression effort.

Testing Confidence : Over‑strict or over‑lenient validation leads to false positives or missed bugs, reducing trust in automation.

Traditional Traffic Replay System

Our team built a conventional system that records traffic via Nginx, streams logs to Kafka, stores them in a database, and flattens JSON responses into a one‑dimensional response_shape for aggregation and comparison.

{
  "host": "xxxx",
  "request_path": "/a/b/c",
  "request_headers": [...],
  "request_params": [...],
  "request_method": "POST",
  "response_shape": ["data.user", "data.name", "data.age"]
}

The replay process consists of three steps: test case collection, deduplication of response shapes, request headers, and request parameters, and finally executing the requests with celery and comparing results.

LLM‑Powered Traffic Replay System

To overcome the limitations of the traditional approach, we introduced large language models (LLMs) to automate data preparation, script generation, and validation.

Data Preparation : Every night the platform aggregates traffic, extracts unique response shapes from Elasticsearch, joins them with request data in MySQL, deduplicates tokens, headers, and parameters, and randomly supplements a small number of records (5‑6) to keep the dataset manageable for the LLM.

Data Storage : The prepared JSON structures are fed into a DIFY workflow that decides whether a new script is needed based on previous versions.

Script Generation : The LLM produces Python test scripts from the structured data; if a script already exists, the model evaluates whether an update is required.

Execution & Analysis : Generated scripts run in the backend, logs are collected, and another LLM analyzes any errors to produce readable alerts.

Images illustrate the service call hierarchy, the traditional replay pipeline, deduplication logic, and the AI workflow.

Future Plans

We have integrated the system into our DevOps pipeline, recorded 257 interfaces, generated 583 scripts, and plan to reduce manual review time, close the three‑day data gap, and improve stability for non‑idempotent services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM traffic replay Backend testing service reliability

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.