Tag

voice conversion

1 views collected around this technical thread.

iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 26, 2022 · Artificial Intelligence

IQDubbing: AI-Powered Multi-Language, Multi-Voice Dubbing System for Film and TV

iQIYI’s IQDubbing system leverages AI‑driven voice conversion to automatically generate high‑quality, expressive dubbing in dozens of languages and over 50 character voice styles, streamlining multilingual film and TV localization, reducing reliance on scarce actors, and earning positive audience feedback, patents and industry awards.

AI DubbingFilm ProductionMultilingual Speech
0 likes · 13 min read
IQDubbing: AI-Powered Multi-Language, Multi-Voice Dubbing System for Film and TV
Kuaishou Tech
Kuaishou Tech
May 29, 2021 · Artificial Intelligence

Speaker-Aware Module for Single-Sample Voice Conversion (SAVC)

The paper presents a speaker‑aware module (SAM) that enables high‑quality voice conversion using only a single utterance of the target speaker, addressing the small‑data challenge in speech timbre transfer and achieving state‑of‑the‑art performance on the Aishell‑1 benchmark.

LPCNetSpeech Synthesisdeep learning
0 likes · 12 min read
Speaker-Aware Module for Single-Sample Voice Conversion (SAVC)
DataFunTalk
DataFunTalk
Jan 16, 2020 · Artificial Intelligence

Voice Conversion: Fundamentals, Methods, and iQIYI Applications

This article provides a comprehensive overview of voice conversion technology, covering its definition, parallel and non‑parallel data approaches, classic and deep‑learning methods such as DTW, GMM, seq2seq, PPG, VAE, Flow, GAN, and practical applications and challenges in iQIYI’s products.

ASRGANSpeech Synthesis
0 likes · 8 min read
Voice Conversion: Fundamentals, Methods, and iQIYI Applications
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 9, 2020 · Artificial Intelligence

Voice Conversion (VC): Fundamentals, Progress, and Applications

Voice conversion (VC) technology changes a speaker’s timbre and style while keeping the spoken text unchanged, supporting one‑to‑one, many‑to‑one, and many‑to‑many scenarios for medical assistance and entertainment, using parallel or non‑parallel data through methods such as DTW‑aligned frame mapping, attention‑based neural networks, PPG‑LSTM pipelines, VAEs, normalizing‑flow models, and GANs, with iQIYI focusing on non‑parallel data, prosody preservation, and noise‑robust augmentation.

Audio ProcessingGANSpeech Synthesis
0 likes · 12 min read
Voice Conversion (VC): Fundamentals, Progress, and Applications