Mobile Development 15 min read

Technical Guidelines for High-Quality Mobile Recording and Audio Processing in Quanmin K Song

Quanmin K Song’s decade‑long mobile‑recording platform combines 48 kHz/16‑bit dry‑signal capture, sub‑70 ms latency via OpenSL ES/AAudio, real‑time clipping and noise detection, lyric‑ and vocal‑accompaniment alignment, pitch‑shifting, adaptive vocal enhancement, 3A DSP/AI processing, and AI‑driven pitch correction to deliver industry‑leading high‑quality mobile singing experiences.

Tencent Music Tech Team

Feb 4, 2024

Technical Guidelines for High-Quality Mobile Recording and Audio Processing in Quanmin K Song

With the widespread adoption of mobile internet, more users are creating and recording music on mobile devices. However, hardware and software limitations pose challenges for achieving high‑quality recording experiences. Since 2014, Quanmin K Song has been deepening its mobile recording technology, building a comprehensive system that addresses low‑latency recording, lyric‑accompaniment alignment, vocal‑accompaniment alignment, monitoring, pitch shifting, vocal enhancement, 3A processing, pitch correction, and more.

Low‑latency, high‑quality recording experience

High‑quality dry‑signal capture depends on proper sampling rate, bit depth, and channel configuration. Quanmin K Song recommends 48 kHz, 16‑bit, mono recording, with real‑time algorithmic detection for issues such as silence or loudness. The captured dry signal is converted to stereo, resampled, and quality‑checked before being passed to the business layer.

Latency is a critical factor on mobile. The Android client uses high‑performance, low‑latency OpenSL ES and AAudio APIs, optimizing sample rate, bit depth, and buffer size to achieve 30‑70 ms round‑trip latency, which is industry‑leading.

Higher sampling rates preserve more high‑frequency content. The app is moving from 48 kHz toward 96 kHz to further improve dry‑signal quality.

Bluetooth headset recording, traditionally plagued by high monitoring latency, has been optimized. Through deep hardware collaboration, latency under 40 ms is achieved on Huawei FreeBuds, with full monitoring effects. Similar customizations exist for Meizu, Vivo, and other brands, allowing on‑device control of monitoring switches and volume.

Multi‑dimensional techniques enhancing recording detail

Advanced detection, lyric‑accompaniment alignment, and vocal‑accompaniment alignment improve precision and user interaction.

Recording detection monitors clipping, volume, and background noise in real time, using dB statistics and the MCRA noise‑estimation algorithm to provide 0.5 s background‑noise forecasts.

Lyric‑accompaniment alignment uses QRC‑format lyrics with timestamps, synchronizing lyric progress with playback in real time.

Vocal‑accompaniment alignment matches vocal and accompaniment timing via audio fingerprinting, offering a ±600 ms manual adjustment panel.

Intelligent accompaniment provides real‑time vocal guidance, dynamically adjusting accompaniment based on pitch and loudness analysis.

Pitch shifting adjusts musical pitch without affecting transient timing, preserving instrument characteristics such as drums.

Headphone monitoring offers both hardware and software monitoring paths. Hardware monitoring leverages custom HAL integration with brands like Huawei, Oppo, Vivo, etc., achieving sub‑40 ms latency. Software monitoring uses OpenSL ES/AAudio on Android and AudioUnit on iOS, reaching as low as 5 ms callback intervals.

Multidimensional scoring evaluates pitch, breath, rhythm, and technique using reference‑based and reference‑free neural‑network models.

Audio processing technologies

3A processing (AGC, ANS, AEC) is implemented via traditional DSP and neural‑network models trained on large‑scale singing data.

Audio effects (spatial rendering, EQ, filter, delay) are dynamically configured via server‑delivered presets, with convolution reverb using sampled impulse responses.

Vocal‑accompaniment ratio dynamically balances vocal and accompaniment loudness based on reference tracks.

Vocal enhancement applies adaptive filtering, multi‑band gain, and gender‑specific parameters to improve clarity and brightness.

Post‑processing pitch correction analyzes time‑frequency characteristics and applies AI‑driven adjustments to improve pitch, timbre, and overall sound quality.

In summary, after nearly a decade of research and development, Quanmin K Song has built a complete high‑quality mobile recording technology stack, delivering industry‑leading latency, audio quality, and feature richness. This case study not only benefits end‑users but also provides valuable insights for the broader audio‑processing field.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Audio Processing Low latency mobile audio Music App recording technology vocal enhancement

Written by

Tencent Music Tech Team

Public account of Tencent Music's development team, focusing on technology sharing and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.