Mastering WebRTC: Build Real-Time Video Calls with STUN, TURN, and Signaling
This article explains WebRTC fundamentals, its architecture, signaling, ICE, STUN/TURN mechanisms, and provides step‑by‑step guidance to build a peer‑to‑peer video chat using JavaScript APIs, Koa and Socket.io, including coturn server setup and essential code snippets.
Zhang Yuhang, a front‑end engineer at WeDoctor Cloud Services, shares his experience.
Preface
At the beginning of 2020 the COVID‑19 pandemic cut off most offline medical channels. WeDoctor, a leader in digital health, quickly responded with online consultation services, and the video consultation between doctors and patients relied on the WebRTC technology described below.
What Is WebRTC?
WebRTC (Web Real‑Time Communication) originated from Google’s acquisition of the VoIP software developer Global IP Solutions in 2010. The GIPS engine was open‑sourced in 2011 under the name WebRTC, providing a platform for real‑time audio, video, and data communication between browsers.
Beyond medical tele‑consultation, WebRTC is used for e‑commerce live streaming, education solutions, and, with the rise of 5G, it also supports cloud gaming.
WebRTC Architecture
The diagram from the official WebRTC site shows the overall architecture:
The architecture can be divided into three parts:
APIs (purple) exposed to web front‑end developers.
Browser‑provided APIs (solid blue).
Audio engine, video engine, and transport layer (dashed blue), all of which can be customized.
WebRTC Peer‑to‑Peer Communication Principles
What are the main difficulties in achieving real‑time audio/video communication between two clients (different networks, each with microphone and camera) and what problems must be solved?
How to discover each other?
How to negotiate audio/video codec capabilities?
How to transmit media data so each side can see the other?
Problem 1 – Discovery and Signaling : Although WebRTC supports direct peer‑to‑peer communication, a signaling server is still required to exchange metadata such as media capabilities and network information. This server (often called a “room server”) also manages room membership and notifies participants of joins, leaves, and capacity.
Problem 2 – Media Negotiation : Browsers support different codecs (e.g., H264, VP8, VP9). The common subset (often H264) is chosen via the Session Description Protocol (SDP). Exchanging SDP information is called media negotiation.
Problem 3 – Network Negotiation : Each side must learn the other's network conditions to establish a connection. When both peers have public IPs, a direct connection is possible. In most cases NAT is involved, requiring ICE (Interactive Connectivity Establishment) which combines STUN and TURN.
ICE is not a single protocol; it integrates:
STUN – discovers the public IP and port of a client behind NAT (hole punching).
TURN – provides a relay when direct traversal fails, forwarding media through a TURN server.
WebRTC defines three types of ICE candidates:
Host candidate – local LAN IP address (highest priority).
Server‑reflexive candidate – public IP discovered via STUN.
Relay candidate – address of a TURN server used as a fallback.
When a direct P2P connection cannot be established, media is relayed through the TURN server.
In short, each peer obtains its SDP and ICE candidates via the WebRTC APIs, exchanges them through a signaling server, and then establishes a direct media channel.
Key WebRTC APIs
Audio/Video Capture
MediaDevices.getUserMedia()
<code>const constraints = {
video: true,
audio: true
};
// In insecure contexts navigator.mediaDevices may be undefined
try {
const stream = await navigator.mediaDevices.getUserMedia(constraints);
document.querySelector('video').srcObject = stream;
} catch (error) {
console.error(error);
}
</code>Enumerating Devices
MediaDevices.enumerateDevices()
<code>try {
const devices = await navigator.mediaDevices.enumerateDevices();
this.videoinputs = devices.filter(device => device.kind === 'videoinput');
this.audiooutputs = devices.filter(device => device.kind === 'audiooutput');
this.audioinputs = devices.filter(device => device.kind === 'audioinput');
} catch (error) {
console.error(error);
}
</code>RTCPeerConnection
RTCPeerConnection is the core API for establishing a peer‑to‑peer connection.
Important media‑negotiation methods:
createOffer
createAnswer
setLocalDescription
setRemoteDescription
Key events:
onicecandidate
onaddstream
The media negotiation process can be simplified to three steps:
Caller (Amy) creates an offer with
createOffer, stores it via
setLocalDescription, and sends the SDP to the callee (Bob) through the signaling server.
Bob receives the offer, stores it with
setRemoteDescription, creates an answer with
createAnswer, and sends the answer back.
Amy receives the answer and calls
setRemoteDescription. Both sides have now exchanged SDP and ICE candidates, establishing the P2P channel and rendering each other's video streams.
WebRTC Practice
Setting Up a coturn Server
For local LAN testing a coturn server is unnecessary; for external access you need a cloud host with an HTTPS‑enabled domain.
Installation steps:
<code>1. git clone https://github.com/coturn/coturn.git
2. cd coturn/
3. ./configure --prefix=/usr/local/coturn
4. make -j 4
5. make install
# Generate key
6. openssl req -x509 -newkey rsa:2048 -keyout /etc/turn_server_pkey.pem -out /etc/turn_server_cert.pem -days 99999 -nodes
</code>coturn Configuration
<code>vim /usr/local/coturn/etc/turnserver.conf
listening-port=3478
external-ip=YOUR_PUBLIC_IP # replace with your host IP
user=USERNAME:PASSWORD # account credentials
realm=YOUR_DOMAIN.com # your domain
</code>Starting coturn
<code># Ensure TCP and UDP port 3478 are open on the cloud host
cd /usr/local/coturn/bin/
./turnserver -c ../etc/turnserver.conf
</code>Signaling Server (Koa + Socket.io)
<code>// server side (server.js)
const Koa = require('koa');
const socket = require('socket.io');
const http = require('http');
const app = new Koa();
const httpServer = http.createServer(app.callback()).listen(3000, () => {});
socket(httpServer).on('connection', (sock) => {
// handle userLeave, checkRoom, etc.
});
// client side (socket.js)
import io from 'socket.io-client';
const socket = io.connect(window.location.origin);
export default socket;
</code>Implementation steps:
Both peers connect to the signaling server, which records room information.
The caller creates an offer, sets the local description, and sends the SDP to the callee via the signaling server.
During
setLocalDescriptioneach peer starts gathering ICE candidates; if direct traversal fails, STUN/TURN is used.
The callee receives the offer, sets it as remote description, creates an answer, sets the local description, and sends the answer back.
The caller receives the answer and sets it as remote description, completing SDP exchange.
Both peers exchange ICE candidates; once a viable candidate is found, media streams are attached to the video elements.
<code>// Example of handling ICE candidates
initPeerListen() {
this.peer.onicecandidate = (event) => {
if (event.candidate) {
socket.emit('addIceCandidate', { candidate: event.candidate, user: this.user });
}
};
// ... other listeners
}
</code>Conclusion
By following the six steps above you can achieve a complete P2P video call. The source code is available at learn-webrtc . In production, the SDK also supports multi‑person calls and screen sharing, all built on top of WebRTC.
References
WebRTC in the real world: STUN, TURN and signaling – https://www.html5rocks.com/en/tutorials/webrtc/infrastructure
WebRTC signaling and STUN/TURN server setup – https://juejin.im/post/6844903844904697864
WeDoctor Frontend Technology
Official WeDoctor Group frontend public account, sharing original tech articles, events, job postings, and occasional daily updates from our tech team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.