Tagged articles
12 articles
Page 1 of 1
Machine Heart
Machine Heart
May 8, 2026 · Artificial Intelligence

Omni2Sound Beats Multi-Modal Audio ‘Generalist’ Dilemma via Data Alignment

Omni2Sound tackles the long‑standing “generalist” dilemma of unified audio generation by constructing a high‑quality V‑T‑A dataset (SoundAtlas), employing a three‑stage progressive training pipeline, and using a simple Diffusion Transformer backbone, ultimately achieving state‑of‑the‑art performance on T2A, V2A and VT2A tasks and strong robustness on off‑screen scenarios.

Audio GenerationData AlignmentMultimodal Learning
0 likes · 16 min read
Omni2Sound Beats Multi-Modal Audio ‘Generalist’ Dilemma via Data Alignment
Machine Heart
Machine Heart
Apr 21, 2026 · Artificial Intelligence

ControlAudio Enables Scripted Timing and Speech Control in Text-to-Audio Generation

ControlAudio, a progressive diffusion model presented at ACL 2026, jointly models text, timing, and phoneme information to achieve precise event timing and intelligible speech in text-to-audio generation, backed by a large mixed real‑synthetic dataset and competitive experimental results.

Audio GenerationControlAudioMultimodal Learning
0 likes · 10 min read
ControlAudio Enables Scripted Timing and Speech Control in Text-to-Audio Generation
Meituan Technology Team
Meituan Technology Team
Apr 16, 2026 · Artificial Intelligence

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

LongCat-AudioDiT introduces a wave‑VAE plus diffusion Transformer architecture that eliminates intermediate spectrograms, solves training‑inference mismatch with dual constraints, replaces classifier‑free guidance with adaptive projection guidance, and achieves state‑of‑the‑art zero‑shot voice cloning performance on multiple benchmarks.

AI researchAudio GenerationOpen Source
0 likes · 12 min read
Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT
Xiaomi Tech
Xiaomi Tech
Feb 3, 2026 · Artificial Intelligence

Xiaomi’s AI Research Secures Spots on ICLR 2026 – Papers and Key Findings

The International Conference on Learning Representations (ICLR) 2026 accepted multiple Xiaomi papers covering multimodal reasoning, reinforcement learning, GUI agents, autonomous driving, audio generation and benchmark design, each presenting novel frameworks, data‑centric training tricks and strong experimental results that advance the state of the art.

Audio GenerationAutonomous DrivingICLR 2026
0 likes · 17 min read
Xiaomi’s AI Research Secures Spots on ICLR 2026 – Papers and Key Findings
Alimama Tech
Alimama Tech
Dec 17, 2025 · Artificial Intelligence

How VeM Achieves Precise Semantic, Temporal, and Rhythmic Alignment in Video-to-Music Generation

The VeM model introduces a latent diffusion framework that leverages hierarchical video parsing, scene‑guided cross‑attention, and a transition‑beat alignment adapter to generate high‑fidelity background music perfectly synchronized with video semantics, timing, and rhythm, outperforming existing baselines on extensive quantitative and qualitative evaluations.

Audio GenerationCross-AttentionLatent Diffusion
0 likes · 14 min read
How VeM Achieves Precise Semantic, Temporal, and Rhythmic Alignment in Video-to-Music Generation
ShiZhen AI
ShiZhen AI
May 13, 2025 · Artificial Intelligence

Top Free Text‑to‑Speech Tools for Content Creators

This article reviews five free text‑to‑speech solutions—AI易视频, Google TTS, Natural Reader, Balabolka, and Speech2Go—detailing their features, language support, installation needs, and unique capabilities to help creators choose the right tool for narration, translation, or multi‑character audio production.

AIAudio GenerationTTS
0 likes · 7 min read
Top Free Text‑to‑Speech Tools for Content Creators
Fighter's World
Fighter's World
Sep 30, 2024 · Artificial Intelligence

Exploring Google NotebookLM: Use Cases, Interaction Experience, and Key Insights

The author reviews Google NotebookLM, describing how it aids deep paper reading, boosts chat willingness with guided prompts, maintains conversation coherence through self‑play insights, highlights the audio‑overview feature, and reflects on AI concepts such as the "bitter lesson" and the limits of self‑play in open scenarios.

AI researchAudio GenerationGoogle
0 likes · 22 min read
Exploring Google NotebookLM: Use Cases, Interaction Experience, and Key Insights
Baidu MEUX
Baidu MEUX
Jul 24, 2024 · Artificial Intelligence

What’s New in AI? Video QA, Audio Generation, and Major Industry Moves

This roundup highlights the latest AI breakthroughs, including Zhipu AI's video‑understanding model for temporal Q&A, Tencent's video‑to‑audio generation system, Vimeo's AI‑content labeling policy, Apple’s Core ML inclusion of ByteDance’s depth model, AMD’s acquisition of Silo AI, Claude’s new editing features, Quark’s all‑in‑one search AI, TikTok’s VR live streaming on Vision Pro, the launch of the "Xinliu" AI search assistant, and Canva’s restrictions on political AI‑generated posters.

AI modelsArtificial IntelligenceAudio Generation
0 likes · 8 min read
What’s New in AI? Video QA, Audio Generation, and Major Industry Moves
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 30, 2023 · Artificial Intelligence

AudioCraft: An Open‑Source PyTorch Library for Audio Generation with MusicGen, AudioGen, and EnCodec

AudioCraft is a PyTorch library that bundles state‑of‑the‑art AI models—MusicGen, AudioGen, and the EnCodec codec—to generate high‑quality audio from text or reference sounds, and the article explains its architecture, evaluation results, and how to install and run it.

AI modelsAudio GenerationAudioGen
0 likes · 9 min read
AudioCraft: An Open‑Source PyTorch Library for Audio Generation with MusicGen, AudioGen, and EnCodec
Volcano Engine Developer Services
Volcano Engine Developer Services
Feb 14, 2023 · Artificial Intelligence

How Make-An-Audio Turns Text Into Realistic Sound Effects

Make-An-Audio, a collaborative text‑to‑audio model from Zhejiang University, Peking University and Volcano Speech, uses a Distill‑then‑Reprogram strategy to generate high‑quality, controllable sound effects from any modality, showcasing impressive demos and promising future AIGC applications.

AIGCAudio GenerationText-to-Audio
0 likes · 7 min read
How Make-An-Audio Turns Text Into Realistic Sound Effects
The Dominant Programmer
The Dominant Programmer
Nov 27, 2020 · Backend Development

Using Jacob in Java for Windows Speech Synthesis and Audio File Generation

This guide walks through downloading Jacob's DLL and JAR, configuring the Java environment, setting up an Eclipse project, and writing Java code that leverages the SAPI COM interfaces to synthesize Chinese text into a WAV file on Windows, complete with step‑by‑step screenshots and a full source example.

Audio GenerationCOMJacob
0 likes · 5 min read
Using Jacob in Java for Windows Speech Synthesis and Audio File Generation