Mobile Development 10 min read

iOS Audio Playback: Demuxing, Decoding, and Playing AAC Files

This article explains how to play AAC (ADTS) audio on iOS by demonstrating the complete pipeline—from demuxing the file header and extracting raw data, through decoding with AudioConverterRef, to playback using AudioUnit and AUGraph, and even covering basic transcoding techniques.

Sohu Tech Products

Aug 25, 2021

iOS Audio Playback: Demuxing, Decoding, and Playing AAC Files

Common audio files such as MP3 and AAC are container formats that can be fed to system audio libraries or third‑party players like AVPlayer or IJKPlayer; this article walks through the whole process of playing an AAC (ADTS) file on iOS.

Preface

The demo is based on iOS and will illustrate the audio playback flow, including demuxing and decoding.

_______              ______________              ________
|       |            |              |            |        |
|  aac  |  demuxer   |    audio     |  decoder   | audio  |
| file  | ---------> | encoded data | ---------> | raw    |
|_______|            |______________|            |________|

AAC

AAC (Advanced Audio Coding) is widely used in live streaming (RTMP, HTTP‑FLV) and offers higher efficiency than MP3.

AAC is a compression format designed for audio data; it uses newer algorithms than MP3, providing better quality at lower bitrates.

Demo AAC file information:

Input #0, aac, from 'video.aac':
  Duration: 00:00:30.45, bitrate: 133 kb/s
  Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 133 kb/s

Demuxing

The goal of demuxing is to separate audio metadata and encoded data. AAC consists of a 7‑byte header followed by the ES body; reading the header and size yields the raw audio block.

// Read header
int head_buf_size = 7;
int *head_buf = malloc(head_buf_size);
fread(head_buf, 1, head_buf_size, _in_file);

// Read size
int s1 = ((int)(*(((uint8_t *)head_buf) + 3))&0x3) << 11;
int s2 = ((int)(*(((uint8_t *)head_buf) + 4))) << 3;
int s3 = (int)(*(((uint8_t *)head_buf) + 5)) >> 5;
int size = s1 + s2 + s3;

// Read raw
int raw_buf_size = size - head_buf_size;
int *raw_buf = malloc(raw_buf_size);
fread(raw_buf, 1, raw_buf_size, _in_file);

Header also contains sampling rate and channel configuration:

int head_buf_size = 7;
int *head_buf = malloc(head_buf_size);
fread(head_buf, 1, head_buf_size, file);

// Sampling rate index
int freqIdx = ((int)(*(((uint8_t *)head_buf) + 2))&0x3C >> 2;
// Channel configuration
int c1 = ((int)(*(((uint8_t *)head_buf) + 2))&0x1) << 2;
int c2 = ((int)(*(((uint8_t *)head_buf) + 3))&0xC0) >> 6;
int chanCfg = c1 + c2;

complete(freqIdx == 3 ? 48000 : 44100, chanCfg);

Decoding

Decoding is performed with AudioConverterRef. By configuring input and output AudioStreamBasicDescription structures, raw AAC data can be converted to PCM.

// Input description
- (AudioStreamBasicDescription)createAACAduioDes {
    UInt32 channels = _channels;
    AudioStreamBasicDescription audioDes ={0};
    audioDes.mSampleRate = _sampleRate;
    audioDes.mFormatID = kAudioFormatMPEG4AAC;
    audioDes.mFormatFlags = kMPEG4Object_AAC_LC;
    audioDes.mFramesPerPacket = 1024;
    audioDes.mChannelsPerFrame = channels;
    return audioDes;
}

// Output description
- (AudioStreamBasicDescription)createPCMAduioDes {
    UInt32 bytesPerSample = sizeof(SInt32);
    AudioStreamBasicDescription audioDes ={0};
    audioDes.mSampleRate = _sampleRate;
    audioDes.mFormatID = kAudioFormatLinearPCM;
    audioDes.mFormatFlags = kLinearPCMFormatFlagIsNonInterleaved | kAudioFormatFlagIsFloat | kAudioFormatFlagIsPacked;
    audioDes.mBytesPerPacket = bytesPerSample;
    audioDes.mFramesPerPacket = 1;
    audioDes.mBytesPerFrame = bytesPerSample;
    audioDes.mChannelsPerFrame = channels;
    audioDes.mBitsPerChannel = 8 * bytesPerSample;
    return audioDes;
}

// Decode call
AudioConverterFillComplexBuffer(self->_audioConverter,
          inputDataProc,/*input function*/
          (__bridge void * _Nullable)(self),
          &ioOutputDataPacketSize,
          outAudioBufferList,/*output buffer*/
          NULL);

Playback

Playback uses AudioUnit and AUGraph, the low‑level iOS audio framework. The demo follows a fixed sequence to feed decoded PCM data into the AudioUnit.

OSStatus status;
status = NewAUGraph(&_auGraph);
[self addAUNode];
status = AUGraphOpen(_auGraph);
[self getAUsFromNodes];
[self setAUProperties];
[self makeAUConnects];
CAShow(_auGraph);
status = AUGraphInitialize(_auGraph);

The data flow is:

___________              ______________  
|           | 1.playback |              |  
|  demuxer  | <--------- |  AudioUnit   |  
|  decoder  | ---------> |              |  
|___________|   2.pcm    |______________|

If the buffer is insufficient, the player reads more data from the demuxer, decodes it, and appends it to the playback buffer.

// When data insufficient
if (self.data.length < size) {
    // Demux
    NSData *d = [_reader read_aac_raw_buf];
    if (d == nil) {return nil;}
    // Decode
    AudioBufferList *b = [_decoder decodeAudioSamepleBuffer:d];
    // Append
    [self.data appendBytes:b->mBuffers[0].mData length:b->mBuffers[0].mDataByteSize];
}
// Return required bytes
NSData *b = [self.data subdataWithRange:NSMakeRange(0, size)];

Transcoding

For speech‑to‑text services (e.g., iFLYTEK), the audio must be PCM with 16 kHz sample rate, 16‑bit mono. The same AudioConverterRef can be used to resample and reformat the audio.

ffmpeg -ac 1 -ar 44100 -f f32le -i of111.pcm -ac 1 -ar 16000 -f s16le ff_out.pcm

References

Audio attribute details: channels, sample rate, bit depth, bitrate – https://www.cnblogs.com/yongdaimi/p/10722355.html#_label5

AAC ADTS format analysis – https://zhuanlan.zhihu.com/p/162998699

AAC ADTS format analysis (CSDN) – https://blog.csdn.net/tantion/article/details/82743942

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Mobile Development iOS AAC Audio Decoding Audio Unit Demuxing

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.