Audio‑Video Synchronization Techniques and ExoPlayer Implementation Details
The article explains audio‑video synchronization fundamentals, user tolerance limits, four sync strategies, and then details ExoPlayer’s audio‑master architecture—two clocks, media‑clock selection, frame‑dropping logic, and how to inject custom clocks—while also addressing Android AudioTrack jitter and work‑arounds.
Audio‑video synchronization (AV sync) aligns the playback timestamps of audio, video, lyrics and other media streams using a reference clock, ensuring that picture and sound are presented together. In media player development, AV sync is crucial for user experience because humans are more sensitive to audio timing than visual timing.
The ITU‑R BT.1359‑1 standard (1998) defines acceptable AV sync tolerances for television broadcasting, which are still used today and have been extended to internet live streaming. User‑perceived acceptable deviation ranges are:
User cannot perceive: –100 ms ~ 25 ms
User can recognize: –125 ms ~ 45 ms
Maximum acceptable deviation: greater than –185 ms and less than 90 ms
Unacceptable deviation: less than –185 ms or greater than 90 ms
The core logic of mainstream AV sync is the “Audio Master” approach: audio playback runs at a constant rate, and its current timestamp controls video rendering. Typical thresholds are:
syncTime ≤ PTS ≤ syncTime → display frame
PTS < syncTime → drop video frame
PTS ≥ unexpectTime → drop or adapt frame
Four common synchronization strategies are described:
Seek‑based sync : obtain audio playback time, seek video to that position, then seek audio back. Simple to implement but causes noticeable stutter and buffering.
Wait‑until‑aligned : let the faster stream pause until timestamps match. Moderate difficulty; still may cause stutter if the time gap is large.
Drop‑frame / wait‑align : used by mainstream players (MediaPlayer, ExoPlayer). Video either waits for audio or drops frames to stay in sync, offering good audio quality and acceptable visual experience.
Speed‑change sync : keep audio as reference and adjust video playback speed. Provides smooth experience but requires careful handling of player states and speed‑related logic.
The article then analyses ExoPlayer’s AV sync mechanism in depth.
Audio Master in ExoPlayer
ExoPlayer maintains two clocks: MediaCodecAudioRenderer provides an Audio Master clock, and StandaloneMediaClock serves as a fallback. The audio renderer’s getMediaClock() returns the renderer itself, confirming that when an audio renderer exists, ExoPlayer uses audio as the master clock.
@Override
@Nullable
public MediaClock getMediaClock() {
return this;
}The DefaultMediaClock selects the first non‑null renderer clock and ensures only one clock is active. If multiple clocks are enabled, an exception is thrown.
public void onRendererEnabled(Renderer renderer) throws ExoPlaybackException {
@Nullable MediaClock rendererMediaClock = renderer.getMediaClock(); // only audio returns non‑null
if (rendererMediaClock != null && rendererMediaClock != rendererClock) {
if (rendererClock != null) {
throw ExoPlaybackException.createForUnexpected(
new IllegalStateException("Multiple renderer media clocks enabled."));
}
this.rendererClock = rendererMediaClock;
this.rendererClockSource = renderer;
rendererClock.setPlaybackParameters(standaloneClock.getPlaybackParameters());
}
}The MediaClock interface provides two essential functions: getPositionUs() to obtain the current playback position and setPlaybackParameters() to adjust playback speed.
public interface MediaClock {
long getPositionUs();
void setPlaybackParameters(PlaybackParameters playbackParameters);
PlaybackParameters getPlaybackParameters();
}During playback, ExoPlayer periodically calls updatePlaybackPositions() , which synchronizes the renderer positions using mediaClock.syncAndGetPositionUs() . The video renderer ( MediaCodecVideoRenderer ) then processes each video buffer in processOutputBuffer() , calculating the early/late offset ( earlyUs ) and deciding whether to render, drop, or skip the frame.
private void updatePlaybackPositions() throws ExoPlaybackException {
// ...
rendererPositionUs = mediaClock.syncAndGetPositionUs(
/* isReadingAhead= */ playingPeriodHolder != queue.getReadingPeriod());
// ...
}Key decisions in processOutputBuffer() include:
Skipping decode‑only buffers.
Dropping frames that are too late.
For Android 5.0+ (API 21+), rendering frames whose early offset is less than 50 ms.
For older versions, sleeping until the frame is within a 10‑30 ms window before rendering.
if (Util.SDK_INT >= 21) {
if (earlyUs < 50000) {
renderOutputBufferV21(codec, bufferIndex, presentationTimeUs, adjustedReleaseTimeNs);
return true;
}
} else {
if (earlyUs < 30000) {
if (earlyUs > 11000) {
Thread.sleep((earlyUs - 10000) / 1000);
}
renderOutputBuffer(codec, bufferIndex, presentationTimeUs);
return true;
}
}The article also shows how to inject a custom MediaClock into ExoPlayer by extending DefaultRenderersFactory and creating a custom video renderer that returns the custom clock.
public class MusicMediaCodecVideoRenderer extends MediaCodecVideoRenderer {
private KtvMediaClock mediaClock;
// constructors omitted for brevity
public void setMediaClock(KtvMediaClock mediaClock) {
this.mediaClock = mediaClock;
}
@Override
protected void onStarted() {
super.onStarted();
if (this.mediaClock != null) {
this.mediaClock.start();
}
}
@Override
protected void onStopped() {
super.onStopped();
if (this.mediaClock != null) {
this.mediaClock.stop();
}
}
@Override
public MediaClock getMediaClock() {
return mediaClock;
}
}Finally, the article discusses limitations of Android’s AudioTrack . AudioTrack cannot seek directly, and its getPlaybackHeadPosition() may exhibit jitter on some devices. A simulated AudioClockOutputDevice is provided to illustrate how one might compute playback progress using byte offsets and sample rates, while recommending jitter detection and fallback to a custom clock when the jitter exceeds a threshold.
public class AudioClockOutputDevice extends AudioOutput {
private final AudioParams params;
private boolean isResumed = false;
int playState = 0;
int writeFrameSize = 0;
// start/stop/pause/resume implementations omitted
@Override
public int write(AudioFrame audioFrame) throws IOException {
int toTimeMillis = AudioUtils.byteSizeToTimeMillis(audioFrame.size,
(int) this.params.sampleRate, this.params.channelCount, this.params.bitDepth);
// simulate timing by sleeping in steps of 10 ms
// ...
writeFrameSize = totalWrittenFrameSize;
return audioFrame.size;
}
@Override
public int getPlaybackHeadPosition() throws IOException {
return AudioUtils.byteSizeToSamplePosition(writeFrameSize,
this.params.channelCount, this.params.bitDepth);
}
// other methods omitted
}In summary, the article provides a comprehensive overview of AV sync concepts, user tolerance thresholds, common synchronization strategies, an in‑depth examination of ExoPlayer’s audio‑master implementation, and practical guidance for extending ExoPlayer with custom clocks while handling the quirks of AudioTrack on various Android devices.
Tencent Music Tech Team
Public account of Tencent Music's development team, focusing on technology sharing and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.