Kuaishou Audio Team Wins ICASSP 2024 SSI and PLC Challenges with Advanced Speech Enhancement and Packet Loss Concealment
The Kuaishou audio team secured first place in both the ICASSP 2024 Speech Signal Improvement and Audio Deep Packet Loss Concealment challenges by deploying a two‑stage GAN‑based speech enhancement system and a hybrid time‑frequency packet‑loss concealment model that dramatically improve real‑time communication quality.
At the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024, the Kuaishou audio‑video technology team won the Speech Signal Improvement (SSI) Challenge and the Audio Deep Packet Loss Concealment (PLC) Challenge, outperforming numerous academic and industrial teams across multiple evaluation metrics defined by ITU‑T P.804 and P.835 standards.
Background: Real‑time communication scenarios such as voice calls and live streaming suffer from noise, reverberation, distortion, packet loss, and codec artifacts. The SSI Challenge aims to enhance degraded speech across the entire communication chain, while the PLC Challenge focuses on concealing lost audio packets under strict latency constraints.
Method Overview (SSI): The team built a data‑augmentation pipeline that simulates dozens of real‑world degradations (noise, reverberation, spectral coloring, clipping, low‑volume far‑field speech, packet loss, DC bias, codec distortion, etc.). A two‑stage generative restoration system was proposed: first, a multi‑sub‑band Generative Adversarial Network (GAN) performs coarse denoising, dereverberation, AI‑EQ, loudness balancing, bandwidth‑extension, and packet‑loss concealment; second, a Sub‑Band Fine‑Grained Speech Enhancement network refines residual noise, artifacts, and spectral details.
Method Overview (PLC): Building on the previous champion solution, the team introduced a full‑band hybrid time‑frequency concealment system with multi‑stage training. The time‑domain network ensures continuity, while the frequency‑domain network restores high‑frequency content, trained on thousands of hours of data with combined signal‑level and adversarial losses.
Experimental Evaluation: Both real‑time and non‑real‑time tracks were assessed using subjective MOS scores for noise, reverberation, spectral coloring, loudness, and signal quality, where Kuaishou’s system achieved the highest scores and secured first place in both tracks. Objective metrics (P.804 Discontinuity, Overall, and word‑error rate) also confirmed the superiority of the PLC solution.
Conclusion: The Kuaishou speech enhancement system significantly improves voice quality in adverse real‑time communication conditions and has already been deployed in Kuaishou’s live‑streaming platform, delivering high‑fidelity speech with low computational complexity.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.