WebRTC Stress Testing Methodology and Findings for 300 Concurrent Connections
This article describes a comprehensive approach to stress‑testing a WebRTC SFU service with 300 concurrent connections, covering client generation via Playwright, monitoring setup, test execution, encountered issues, tuning steps, and final performance conclusions.
The team needed to validate that their WebRTC service could handle at least 300 concurrent connections, so they designed a pressure‑test focusing on media streams rather than traditional HTTP requests. The test targeted an SFU architecture, which forwards streams without heavy processing, making it suitable for high‑scale scenarios.
To generate many WebRTC clients, the team initially considered a Linux C++ client but switched to using Playwright to simulate browsers, feeding local video and audio files as media sources. Playwright was installed via pip install playwright and playwright install , and the core test code runs with the async API.
Key Playwright launch arguments include:
f"--use-fake-device-for-media-stream",
f"--use-fake-ui-for-media-stream"When using local media files, additional flags are added:
f"--use-fake-device-for-media-stream",
f"--use-fake-ui-for-media-stream",
f"--no-sandbox",
f"--use-file-for-fake-video-capture=./webrtctest/1.y4m",
f"--use-file-for-fake-audio-capture=webrtctest/2.wav"The script reads a CSV of authentication codes, opens a corresponding number of browser tabs, and performs asynchronous actions such as joining rooms, navigating pages, and handling pagination. Time‑outs were increased with await page.goto(url, timeout=10000000) to avoid navigation failures.
Test infrastructure involved 50‑60 Windows machines, each running a limited number of tabs (e.g., 4 tabs on an 8 GB machine, 7 tabs on a 20 GB machine) due to CPU and memory constraints. The team considered Playwright Grid for distributed execution but abandoned it because each machine required unique parameters.
During testing, several issues were identified: server IP scheduling inaccuracies, high memory usage and leaks, client disconnections, and room‑locking after a certain participant count. Solutions included switching the server’s memory allocator to tcmalloc, extending server timeout to 10 seconds, and adding reconnection logic on the client side.
Results showed that under 300 concurrent connections the SFU handled the load with acceptable CPU, memory, and bitrate usage, and subjective quality scores for video and audio ranged from good to perfect. The final report summarises conclusions, resource usage metrics, functional verification, and a list of problems with their root causes and mitigations.
360 Quality & Efficiency
360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.