iOS Audio Playback: Demuxing, Decoding, and Playing AAC Files
This article explains how to play AAC (ADTS) audio on iOS by demonstrating the complete pipeline—from demuxing the file header and extracting raw data, through decoding with AudioConverterRef, to playback using AudioUnit and AUGraph, and even covering basic transcoding techniques.
Common audio files such as MP3 and AAC are container formats that can be fed to system audio libraries or third‑party players like AVPlayer or IJKPlayer; this article walks through the whole process of playing an AAC (ADTS) file on iOS.
Preface
The demo is based on iOS and will illustrate the audio playback flow, including demuxing and decoding.
_______ ______________ ________
| | | | | |
| aac | demuxer | audio | decoder | audio |
| file | ---------> | encoded data | ---------> | raw |
|_______| |______________| |________|AAC
AAC (Advanced Audio Coding) is widely used in live streaming (RTMP, HTTP‑FLV) and offers higher efficiency than MP3.
AAC is a compression format designed for audio data; it uses newer algorithms than MP3, providing better quality at lower bitrates.
Demo AAC file information:
Input #0, aac, from 'video.aac':
Duration: 00:00:30.45, bitrate: 133 kb/s
Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 133 kb/sDemuxing
The goal of demuxing is to separate audio metadata and encoded data. AAC consists of a 7‑byte header followed by the ES body; reading the header and size yields the raw audio block.
// Read header
int head_buf_size = 7;
int *head_buf = malloc(head_buf_size);
fread(head_buf, 1, head_buf_size, _in_file);
// Read size
int s1 = ((int)(*(((uint8_t *)head_buf) + 3))&0x3) << 11;
int s2 = ((int)(*(((uint8_t *)head_buf) + 4))) << 3;
int s3 = (int)(*(((uint8_t *)head_buf) + 5)) >> 5;
int size = s1 + s2 + s3;
// Read raw
int raw_buf_size = size - head_buf_size;
int *raw_buf = malloc(raw_buf_size);
fread(raw_buf, 1, raw_buf_size, _in_file);Header also contains sampling rate and channel configuration:
int head_buf_size = 7;
int *head_buf = malloc(head_buf_size);
fread(head_buf, 1, head_buf_size, file);
// Sampling rate index
int freqIdx = ((int)(*(((uint8_t *)head_buf) + 2))&0x3C >> 2;
// Channel configuration
int c1 = ((int)(*(((uint8_t *)head_buf) + 2))&0x1) << 2;
int c2 = ((int)(*(((uint8_t *)head_buf) + 3))&0xC0) >> 6;
int chanCfg = c1 + c2;
complete(freqIdx == 3 ? 48000 : 44100, chanCfg);Decoding
Decoding is performed with AudioConverterRef . By configuring input and output AudioStreamBasicDescription structures, raw AAC data can be converted to PCM.
// Input description
- (AudioStreamBasicDescription)createAACAduioDes {
UInt32 channels = _channels;
AudioStreamBasicDescription audioDes ={0};
audioDes.mSampleRate = _sampleRate;
audioDes.mFormatID = kAudioFormatMPEG4AAC;
audioDes.mFormatFlags = kMPEG4Object_AAC_LC;
audioDes.mFramesPerPacket = 1024;
audioDes.mChannelsPerFrame = channels;
return audioDes;
}
// Output description
- (AudioStreamBasicDescription)createPCMAduioDes {
UInt32 bytesPerSample = sizeof(SInt32);
AudioStreamBasicDescription audioDes ={0};
audioDes.mSampleRate = _sampleRate;
audioDes.mFormatID = kAudioFormatLinearPCM;
audioDes.mFormatFlags = kLinearPCMFormatFlagIsNonInterleaved | kAudioFormatFlagIsFloat | kAudioFormatFlagIsPacked;
audioDes.mBytesPerPacket = bytesPerSample;
audioDes.mFramesPerPacket = 1;
audioDes.mBytesPerFrame = bytesPerSample;
audioDes.mChannelsPerFrame = channels;
audioDes.mBitsPerChannel = 8 * bytesPerSample;
return audioDes;
}
// Decode call
AudioConverterFillComplexBuffer(self->_audioConverter,
inputDataProc,/*input function*/
(__bridge void * _Nullable)(self),
&ioOutputDataPacketSize,
outAudioBufferList,/*output buffer*/
NULL);Playback
Playback uses AudioUnit and AUGraph, the low‑level iOS audio framework. The demo follows a fixed sequence to feed decoded PCM data into the AudioUnit.
OSStatus status;
status = NewAUGraph(&_auGraph);
[self addAUNode];
status = AUGraphOpen(_auGraph);
[self getAUsFromNodes];
[self setAUProperties];
[self makeAUConnects];
CAShow(_auGraph);
status = AUGraphInitialize(_auGraph);The data flow is:
___________ ______________
| | 1.playback | |
| demuxer | <--------- | AudioUnit |
| decoder | ---------> | |
|___________| 2.pcm |______________|If the buffer is insufficient, the player reads more data from the demuxer, decodes it, and appends it to the playback buffer.
// When data insufficient
if (self.data.length < size) {
// Demux
NSData *d = [_reader read_aac_raw_buf];
if (d == nil) {return nil;}
// Decode
AudioBufferList *b = [_decoder decodeAudioSamepleBuffer:d];
// Append
[self.data appendBytes:b->mBuffers[0].mData length:b->mBuffers[0].mDataByteSize];
}
// Return required bytes
NSData *b = [self.data subdataWithRange:NSMakeRange(0, size)];Transcoding
For speech‑to‑text services (e.g., iFLYTEK), the audio must be PCM with 16 kHz sample rate, 16‑bit mono. The same AudioConverterRef can be used to resample and reformat the audio.
ffmpeg -ac 1 -ar 44100 -f f32le -i of111.pcm -ac 1 -ar 16000 -f s16le ff_out.pcmReferences
Audio attribute details: channels, sample rate, bit depth, bitrate – https://www.cnblogs.com/yongdaimi/p/10722355.html#_label5
AAC ADTS format analysis – https://zhuanlan.zhihu.com/p/162998699
AAC ADTS format analysis (CSDN) – https://blog.csdn.net/tantion/article/details/82743942
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.