Tag

Speech Synthesis

1 views collected around this technical thread.

Python Programming Learning Circle
Python Programming Learning Circle
Mar 20, 2025 · Artificial Intelligence

Building a Python Voice Synthesis System Using Xunfei WebAPI

This tutorial explains how to create a Python-based speech synthesis tool by installing required packages, configuring Xunfei Open Platform credentials, implementing a Tkinter GUI, and using WebSocket communication to convert text into audio with selectable voice profiles.

GUISpeech SynthesisWebSocket
0 likes · 8 min read
Building a Python Voice Synthesis System Using Xunfei WebAPI
Tencent Cloud Developer
Tencent Cloud Developer
Jun 14, 2024 · Artificial Intelligence

GPT-4o Speech Multimodal Technology: Speech Tokenization, LLM Integration, and Zero-shot TTS

GPT‑4o’s speech multimodal system discretizes audio into semantic and acoustic tokens, integrates these tokens with large language models through multi‑stage instruction tuning, and employs hierarchical zero‑shot text‑to‑speech decoding, enabling low‑latency, streaming, and prompt‑driven voice synthesis for applications like gaming.

AudioLMGPT-4oLLM integration
0 likes · 33 min read
GPT-4o Speech Multimodal Technology: Speech Tokenization, LLM Integration, and Zero-shot TTS
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Apr 29, 2024 · Artificial Intelligence

Build AI-Powered Spring Boot Apps with Alibaba Tongyi: A Hands‑On Guide

This tutorial walks through setting up Spring AI 0.8.1 with Spring Boot 3.1.1, configuring Alibaba Tongyi model access, and implementing chat, streaming, image, and audio generation endpoints using Java code and vector database integrations.

Alibaba AIChatJava
0 likes · 9 min read
Build AI-Powered Spring Boot Apps with Alibaba Tongyi: A Hands‑On Guide
php中文网 Courses
php中文网 Courses
Sep 1, 2023 · Artificial Intelligence

Integrating Baidu Text-to-Speech API with PHP

This tutorial demonstrates how to obtain Baidu TTS credentials, construct the required signature, send an HTTP request using PHP's cURL library, and save the returned audio data as an MP3 file, providing a complete code example for developers.

API IntegrationBaidu TTSPHP
0 likes · 5 min read
Integrating Baidu Text-to-Speech API with PHP
58 Tech
58 Tech
Aug 25, 2023 · Artificial Intelligence

Voice Cloning Technology in AI Sales Assistant

This article introduces the AI sales assistant from 58.com, detailing its background, a few‑shot voice cloning approach using real dialogue data, multi‑accent naturalness optimization, deployment architecture, and future plans, while evaluating performance metrics and discussing challenges in speech synthesis quality and stability.

AI sales assistantSpeech Synthesisfew-shot learning
0 likes · 19 min read
Voice Cloning Technology in AI Sales Assistant
DataFunSummit
DataFunSummit
Aug 15, 2023 · Artificial Intelligence

AI Sales Assistant: Few‑Shot Voice Cloning and Multi‑Accent Naturalness Optimization

The article presents 58 Tongcheng AI Lab's AI sales assistant, detailing its background, a few‑shot voice‑cloning pipeline built on real dialogue data, data preprocessing, FastSpeech2‑based acoustic modeling, multi‑accent style transfer, deployment architecture, controllable synthesis parameters, and future research directions.

AI sales assistantFastSpeech2Speech Synthesis
0 likes · 20 min read
AI Sales Assistant: Few‑Shot Voice Cloning and Multi‑Accent Naturalness Optimization
Tencent Cloud Developer
Tencent Cloud Developer
Apr 4, 2023 · Artificial Intelligence

Step-by-Step Guide to Building Your Own Realistic AI Image Generation Website with Stable Diffusion

This step‑by‑step tutorial shows how to set up a Stable Diffusion web UI, install the required Python environment and GPU‑enabled PyTorch, add Chinese localization and optional LoRA or Deforum extensions, generate realistic human images, create animated videos, and add speech with D‑ID, all ready for deployment on your own AI website.

AI image generationDeforumPython
0 likes · 9 min read
Step-by-Step Guide to Building Your Own Realistic AI Image Generation Website with Stable Diffusion
DataFunSummit
DataFunSummit
Dec 9, 2022 · Artificial Intelligence

Volcano Engine Virtual Digital Human Technology Overview

This article provides a comprehensive overview of Volcano Engine's virtual digital human platform, detailing its definition, AI‑driven and human‑driven classifications, 2D and 3D technical architectures, multi‑modal perception, interaction capabilities, application scenarios, and future development directions.

2D avatar3D avatarSpeech Synthesis
0 likes · 15 min read
Volcano Engine Virtual Digital Human Technology Overview
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 26, 2022 · Artificial Intelligence

IQDubbing: AI-Powered Multi-Language, Multi-Voice Dubbing System for Film and TV

iQIYI’s IQDubbing system leverages AI‑driven voice conversion to automatically generate high‑quality, expressive dubbing in dozens of languages and over 50 character voice styles, streamlining multilingual film and TV localization, reducing reliance on scarce actors, and earning positive audience feedback, patents and industry awards.

AI DubbingFilm ProductionMultilingual Speech
0 likes · 13 min read
IQDubbing: AI-Powered Multi-Language, Multi-Voice Dubbing System for Film and TV
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Aug 10, 2022 · Artificial Intelligence

Multi-Stage Multi-Codebook VQ-VAE for High-Performance Neural Text-to-Speech (MSMC‑TTS)

The MSMC‑TTS system, a multi‑stage multi‑codebook VQ‑VAE based neural text‑to‑speech solution, delivers near‑human audio quality (MOS 4.41) with a compact 3.12 MB acoustic model, substantially surpassing Mel‑Spectrogram FastSpeech baselines in naturalness and efficiency.

Compact RepresentationMulti-Stage ModelingNeural TTS
0 likes · 10 min read
Multi-Stage Multi-Codebook VQ-VAE for High-Performance Neural Text-to-Speech (MSMC‑TTS)
DataFunSummit
DataFunSummit
Apr 14, 2022 · Artificial Intelligence

Advances in Alibaba's Digital Human Technology: Construction, Performance, Interaction, and the MMTK Multimodal Algorithm Library

This article reviews Alibaba's digital‑human (virtual avatar) research over the past few years, covering the product’s evolution, a six‑stage pipeline for building digital humans, solutions to key challenges in realism, multimodal interaction, and the open‑source MMTK algorithm library.

Digital HumanEmotion ModelingSpeech Synthesis
0 likes · 12 min read
Advances in Alibaba's Digital Human Technology: Construction, Performance, Interaction, and the MMTK Multimodal Algorithm Library
Python Programming Learning Circle
Python Programming Learning Circle
Apr 4, 2022 · Artificial Intelligence

Building a Simple Speech Synthesis System with iFlytek WebAPI in Python

This tutorial explains how to create a lightweight speech synthesis tool using iFlytek's WebAPI, covering required environment setup, API credential acquisition, GUI design with Tkinter, and detailed Python code for WebSocket communication, audio handling, and WAV file generation.

Audio ProcessingPythonSpeech Synthesis
0 likes · 8 min read
Building a Simple Speech Synthesis System with iFlytek WebAPI in Python
Test Development Learning Exchange
Test Development Learning Exchange
Oct 17, 2021 · Artificial Intelligence

Using pyttsx3 for Text-to-Speech in Python

This article provides a hands‑on guide to using the pyttsx3 library for offline text‑to‑speech conversion in Python, covering installation, basic playback, voice property adjustments, multilingual support, and conditional speech examples with counters.

PythonSpeech Synthesisconditional speech
0 likes · 7 min read
Using pyttsx3 for Text-to-Speech in Python
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 11, 2021 · Artificial Intelligence

iQIYI M2VoC Multi‑Speaker Multi‑Style Voice Cloning Challenge at ICASSP 2021 – Overview and Results

The iQIYI M2VoC competition at ICASSP 2021, the first low‑resource multi‑speaker, multi‑style voice‑cloning challenge, attracted 153 academic and industry teams to tackle few‑shot (100 utterances) and extreme few‑shot (5 utterances) tracks, evaluated by professional listeners, yielding strong innovations and applications while confirming that single‑sample cloning remains unsolved.

AIAudio ProcessingICASSP2021
0 likes · 7 min read
iQIYI M2VoC Multi‑Speaker Multi‑Style Voice Cloning Challenge at ICASSP 2021 – Overview and Results
Kuaishou Tech
Kuaishou Tech
May 29, 2021 · Artificial Intelligence

Speaker-Aware Module for Single-Sample Voice Conversion (SAVC)

The paper presents a speaker‑aware module (SAM) that enables high‑quality voice conversion using only a single utterance of the target speaker, addressing the small‑data challenge in speech timbre transfer and achieving state‑of‑the‑art performance on the Aishell‑1 benchmark.

LPCNetSpeech Synthesisdeep learning
0 likes · 12 min read
Speaker-Aware Module for Single-Sample Voice Conversion (SAVC)
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 20, 2020 · Artificial Intelligence

iQIYI M2VoC Multi‑Speaker Multi‑Style Voice Cloning Challenge (ICASSP 2021) Overview

The iQIYI M2VoC Challenge at ICASSP 2021 invites researchers to tackle low‑resource multi‑speaker, multi‑style voice cloning by providing Mandarin datasets, few‑shot and extremely few‑shot tracks with strict data rules, MOS‑based subjective evaluation, and a $9,600 prize pool for top submissions.

AIChallengeICASSP
0 likes · 10 min read
iQIYI M2VoC Multi‑Speaker Multi‑Style Voice Cloning Challenge (ICASSP 2021) Overview
DataFunTalk
DataFunTalk
Mar 10, 2020 · Artificial Intelligence

Interspeech 2019 Highlights: End‑to‑End Speech AI Technologies and Key Paper Summaries

The article reviews Interspeech 2019, summarizing major trends and representative papers in end‑to‑end speech recognition, synthesis, natural language understanding, speaker recognition, and speech translation, while also highlighting best student papers and providing resources for further study.

AIInterspeech 2019Natural Language Understanding
0 likes · 24 min read
Interspeech 2019 Highlights: End‑to‑End Speech AI Technologies and Key Paper Summaries
DataFunTalk
DataFunTalk
Jan 16, 2020 · Artificial Intelligence

Voice Conversion: Fundamentals, Methods, and iQIYI Applications

This article provides a comprehensive overview of voice conversion technology, covering its definition, parallel and non‑parallel data approaches, classic and deep‑learning methods such as DTW, GMM, seq2seq, PPG, VAE, Flow, GAN, and practical applications and challenges in iQIYI’s products.

ASRGANSpeech Synthesis
0 likes · 8 min read
Voice Conversion: Fundamentals, Methods, and iQIYI Applications
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 9, 2020 · Artificial Intelligence

Voice Conversion (VC): Fundamentals, Progress, and Applications

Voice conversion (VC) technology changes a speaker’s timbre and style while keeping the spoken text unchanged, supporting one‑to‑one, many‑to‑one, and many‑to‑many scenarios for medical assistance and entertainment, using parallel or non‑parallel data through methods such as DTW‑aligned frame mapping, attention‑based neural networks, PPG‑LSTM pipelines, VAEs, normalizing‑flow models, and GANs, with iQIYI focusing on non‑parallel data, prosody preservation, and noise‑robust augmentation.

Artificial IntelligenceAudio ProcessingGAN
0 likes · 12 min read
Voice Conversion (VC): Fundamentals, Progress, and Applications
DataFunTalk
DataFunTalk
Nov 5, 2019 · Artificial Intelligence

Low-Resource Text-to-Speech: FastSpeech, LightTTS, and LightBERT Overview

This article reviews recent advances in low‑resource text‑to‑speech synthesis, covering the background of TTS, challenges in data‑ and compute‑limited scenarios, and detailed descriptions of FastSpeech, LightTTS, LightBERT, and related lightweight vocoder techniques, along with experimental results and future research directions.

Artificial IntelligenceFastSpeechLightTTS
0 likes · 20 min read
Low-Resource Text-to-Speech: FastSpeech, LightTTS, and LightBERT Overview