Fundamentals 14 min read

Design and Implementation of Low‑Latency Real‑Time Streaming Protocols: RTP, RTCP, and Packet‑Loss Solutions

The article explains why TCP‑based protocols cannot meet low‑latency requirements for live‑streaming conferences and introduces RTP, RTCP, jitter, round‑trip time, and three packet‑loss mitigation strategies—retransmission, forward error correction, and cross‑transport—along with a brief overview of DCCP for congestion control.

Architecture Digest
Architecture Digest
Architecture Digest
Design and Implementation of Low‑Latency Real‑Time Streaming Protocols: RTP, RTCP, and Packet‑Loss Solutions

Author: Wang Yuhang, co‑founder & CTO of Hongdian Live, graduate of University of Science and Technology of China, former member of the Windcloud Live founding team, involved in designing the proprietary RTMFP protocol for large‑scale video bullet‑screen systems.

The live‑streaming market is booming with many cloud service providers and products such as Inke, YY, and Yizhibo. Traditional CDN solutions mainly use RTMP (Adobe) and HLS, which work well for playback but cannot satisfy the ultra‑low latency needed in conference scenarios.

Why TCP‑based protocols fall short: TCP optimizes for bandwidth utilization and penalizes packet loss by reducing the perceived congestion window, which introduces additional delay. Loss handling is the biggest factor affecting latency, making TCP unsuitable for real‑time media where timely delivery is critical.

RTP (Real‑Time Protocol): Provides end‑to‑end transmission of real‑time data, payload type definition, sequencing, timestamping, and monitoring. It does not guarantee delivery order or timeliness, leaving room for custom congestion‑control and quality‑of‑service strategies.

RTP Header: Contains version, padding flag, extension flag, CSRC count, marker, payload type, sequence number, timestamp, and SSRC (synchronization source) which supports forwarding and mixing.

RTP Network Example: Three roles – endpoints (participants), a mixer (combines multiple audio streams into one, reducing bandwidth and aligning timestamps), and a forwarder (distributes the mixed stream).

RTCP: Complements RTP by providing control information. It defines five packet types: Sender Report, Receiver Report, Source Description, Bye, and Application‑Specific messages.

Receiver Reports convey packet loss, jitter, and round‑trip time (RTT) information derived from RTP sequence numbers.

Jitter: Variation in packet arrival time. Example: packet A sent at t=1 s, B at t=2 s, but the receiver gets A at t=3 s and B at t=5 s, resulting in 2 s of jitter. Jitter can be smoothed using standard formulas (image shown).

Round‑Trip Time (RRT): Measures the time from sending a packet to receiving its feedback, using DLSR and LSR fields in RTCP reports. It is similar to the network ping.

Media Packet Loss Handling Strategies (three):

Retransmission: Resend lost packets; precise but adds extra traffic.

Forward Error Correction (FEC): Adds redundant data so lost packets can be reconstructed without retransmission. Two types: media‑independent and media‑dependent. The article focuses on media‑independent FEC.

Cross‑Transport: Re‑arranges packet ordering (e.g., sending 1‑3‑5‑7‑9 instead of 1‑2‑3‑4‑5) so that loss of a contiguous block affects less critical data, allowing near‑lossless playback.

FEC uses layered data protection, assigning different importance levels to packet parts. The FEC packet format extends the RTP header with additional redundancy fields.

FEC Algorithm (XOR example): With four data blocks (A, B, C, D) and an extra parity block E = A⊕B⊕C⊕D, any single lost block can be recovered by XOR‑ing the remaining blocks with E. The scheme can tolerate one lost packet per group; multiple losses require more sophisticated codes.

Cross‑transport changes the transmission pattern (e.g., sending 1‑4‑7, 2‑5‑8, 3‑6‑9) so that a burst loss affects dispersed fragments, allowing the decoder to reconstruct the stream with minimal visual artifacts.

Datagram Congestion Control Protocol (DCCP): Provides a congestion‑control‑aware transport that can be used with RTP. It includes session establishment (similar to TCP handshake), data windows, ACK feedback, and explicit congestion control, allowing finer control over latency and bandwidth for real‑time media.

In conference scenarios, the data window must be kept under one second because older packets become useless; DCCP’s selective ACKs enable the sender to retransmit only necessary fragments, reducing unnecessary load.

© Content sourced from the network; all rights belong to the original author. If any infringement is identified, please notify for removal.

streamingNetwork ProtocolsLow LatencyRTPFECDCCPRTCP
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.