Tagged articles

123 articles

Page 2 of 2

May 15, 2019 · Artificial Intelligence

AI‑Driven Audio Content Understanding and Safety for Live Streams

Using AI to automatically understand and secure audio content, this article discusses the challenges of manual audio analysis, outlines a four‑step pipeline—audio segmentation, speech‑to‑text, labeling, and synthesis—and describes models such as VAD, ASR, sound classification, text recognition, and behavior detection for live‑stream moderation.

AIAudio ProcessingContent Safety

0 likes · 11 min read

AI‑Driven Audio Content Understanding and Safety for Live Streams

Hulu Beijing

Apr 22, 2019 · Artificial Intelligence

How Has Speech Recognition Evolved from Traditional Methods to Modern Deep Learning?

This article reviews the fundamentals of automatic speech recognition, compares traditional MFCC‑GMM‑HMM pipelines with modern deep neural network approaches such as DNN‑HMM, LSTM‑CTC, and attention‑based models, and illustrates each evolution step with flowchart diagrams and key references.

ASRCTCDNN

0 likes · 11 min read

How Has Speech Recognition Evolved from Traditional Methods to Modern Deep Learning?

Tencent Cloud Developer

Feb 26, 2019 · Artificial Intelligence

Tencent Cloud Intelligent Speech Technology: Development, Challenges and Practical Applications

Tencent Cloud's intelligent speech platform combines high‑accuracy ASR, advanced WaveNet‑based TTS, and solutions for noise, far‑field, and dialect challenges, enabling voice input, transcription, and customer‑service bots, with real‑world deployments in finance, museums, hotels, and other industry scenarios.

ASRHuman-Computer InteractionSpeech Recognition

0 likes · 8 min read

Tencent Cloud Intelligent Speech Technology: Development, Challenges and Practical Applications

Ctrip Technology

Feb 21, 2019 · Artificial Intelligence

Speech Recognition and Synthesis: Principles, Challenges, Optimizations, and Tencent Cloud Use Cases

This article reviews the development roadmap, current industry status, challenges, typical deployment scenarios, and optimization methods for speech recognition (ASR) and speech synthesis (TTS), and shares several Tencent Cloud intelligent voice case studies to illustrate practical applications.

AICloud ComputingSpeech Recognition

0 likes · 9 min read

Speech Recognition and Synthesis: Principles, Challenges, Optimizations, and Tencent Cloud Use Cases

Alibaba Cloud Developer

Feb 12, 2019 · Artificial Intelligence

Essential AI Research Highlights to Jump‑Start Your Post‑Holiday Learning

After the Chinese New Year break, this curated collection of key AI articles—spanning computer vision, speech recognition, natural language processing, recommendation systems, and more—helps technical readers quickly regain momentum in work and study by revisiting core technologies with real‑world case studies.

AISpeech Recognitioncomputer vision

0 likes · 6 min read

Essential AI Research Highlights to Jump‑Start Your Post‑Holiday Learning

MaGe Linux Operations

Feb 1, 2019 · Artificial Intelligence

Master Python Speech Recognition: Install, Process Audio Files, and Capture Live Voice

This guide walks you through the fundamentals of speech recognition, explains how modern systems work, shows how to choose and install the Python SpeechRecognition package, and demonstrates processing audio files, handling noise, using offsets, and capturing live microphone input with practical code examples.

Speech Recognitionaudio-processingmachine-learning

0 likes · 16 min read

Master Python Speech Recognition: Install, Process Audio Files, and Capture Live Voice

JD Tech

Jan 16, 2019 · Artificial Intelligence

Technical Deep Dive of JD’s Intelligent Customer Service 2.0: AI‑Driven Intent Recognition, Emotion Analysis, and Smart Scheduling

This article presents a comprehensive technical analysis of JD’s Intelligent Customer Service 2.0, detailing AI‑based intent recognition with the ABSQ framework, hierarchical attention networks, emotion analysis via CNN, speech navigation using ASR/NLP, and machine‑learning‑driven smart dispatch that together boost accuracy and user experience.

AICustomer ServiceSpeech Recognition

0 likes · 10 min read

Technical Deep Dive of JD’s Intelligent Customer Service 2.0: AI‑Driven Intent Recognition, Emotion Analysis, and Smart Scheduling

Tencent Cloud Developer

Dec 27, 2018 · Artificial Intelligence

Overview of Speech and Semantic Recognition Technologies Presented at the Tencent Cloud+ Community Developer Conference

At the inaugural Tencent Cloud+ Community Developer Conference, experts detailed the evolution of speech and semantic recognition—from early MFCC/HMM‑GMM models to modern end‑to‑end deep‑learning architectures—and showcased WeChat Zhiling’s full‑stack platform, multilingual models, high‑accuracy cloud services, translation solutions, legal applications, and integration into smart devices.

AISpeech RecognitionTencent Cloud

0 likes · 9 min read

Overview of Speech and Semantic Recognition Technologies Presented at the Tencent Cloud+ Community Developer Conference

Tencent Cloud Developer

Oct 10, 2018 · Artificial Intelligence

What Are the Real Challenges and Future Trends in Intelligent Voice Technology?

This article examines the current landscape of intelligent voice technology—including speech recognition, synthesis, voiceprint identification, and acoustic event detection—highlighting technical hurdles, evaluation metrics, recent advances such as WaveNet, and a wide range of practical applications from mobile devices to smart hardware and enterprise solutions.

Audio ProcessingSpeech RecognitionTencent Cloud

0 likes · 16 min read

What Are the Real Challenges and Future Trends in Intelligent Voice Technology?

Tencent Cloud Developer

Sep 30, 2018 · Artificial Intelligence

Smart Speaker Voice Interaction Technology: Recent Advances and Tencent's Research Progress

The article surveys Tencent’s recent advances in smart‑speaker voice interaction, detailing a full technology chain—from front‑end capture, wake‑up and enhancement, through speaker verification and short‑speech voiceprint, to TDNN/LSTM speech recognition, target speaker extraction, and end‑to‑end attention modeling for robust, personalized performance.

Speech RecognitionTTSattention mechanism

0 likes · 18 min read

Smart Speaker Voice Interaction Technology: Recent Advances and Tencent's Research Progress

Tencent Cloud Developer

Sep 26, 2018 · Artificial Intelligence

Breakthroughs in AI: Deep Learning Applications in Speech Recognition

The talk reviews how massive speech data, faster GPUs/CPUs, and deep‑learning models such as DNN, LSTM, CNN, and end‑to‑end CTC have dramatically boosted speech‑recognition accuracy, while outlining remaining challenges like noise, accents, far‑field and multi‑speaker scenarios and describing Tencent Cloud’s related services.

AISpeech Recognitionacoustic modeling

0 likes · 16 min read

Breakthroughs in AI: Deep Learning Applications in Speech Recognition

iQIYI Technical Product Team

Sep 14, 2018 · Artificial Intelligence

Limitations of Language Models in Voice Interaction and HomeAI Solutions

iQIYI HomeAI tackles the bottleneck of static language models in voice assistants by separating phonetic and semantic processing, correcting ASR errors at the intent‑recognition layer with pinyin‑enhanced entity correction, thereby reducing error amplification in video‑on‑demand interactions and paving the way for adaptive, personalized voice experiences.

AISpeech Recognitionintent recognition

0 likes · 7 min read

Limitations of Language Models in Voice Interaction and HomeAI Solutions

Alibaba Cloud Developer

Jul 18, 2018 · Artificial Intelligence

Inside Alibaba’s Postdoc Labs: Real‑World AI Research and Innovation

Alibaba’s post‑doctoral program connects PhDs with massive industry data and real‑world projects, showcasing how researchers like Xue Shaofei and Pei Changhua develop cutting‑edge speech‑recognition, scheduling and recommendation technologies that directly impact millions of users.

AIIndustry-Academia CollaborationPostdoctoral Research

0 likes · 9 min read

Inside Alibaba’s Postdoc Labs: Real‑World AI Research and Innovation

Didi Tech

Jun 1, 2018 · Artificial Intelligence

Didi's Attention-Based End-to-End Mandarin Speech Recognition: A Detailed Review

Didi’s attention‑based end‑to‑end Mandarin speech recognizer, built on the Listen‑Attend‑Spell architecture and modeling roughly 5,000 common characters, delivers 15‑25% relative accuracy gains over its prior LSTM‑CTC system while cutting model size, latency and server requirements and simplifying training by eliminating separate acoustic, pronunciation and language components.

End-to-EndLASMandarin

0 likes · 14 min read

Didi's Attention-Based End-to-End Mandarin Speech Recognition: A Detailed Review

WeChat Backend Team

May 29, 2018 · Artificial Intelligence

Build a Zero‑Setup Face‑to‑Face Translator Mini‑Program with WeChat’s AI Plugin

This guide walks developers through adding WeChat’s free AI translation plugin to a mini‑program, covering plugin installation, voice input, real‑time transcription, text translation, and speech synthesis in five straightforward steps, complete with code snippets and configuration details.

AI translationMini ProgramSpeech Recognition

0 likes · 6 min read

Build a Zero‑Setup Face‑to‑Face Translator Mini‑Program with WeChat’s AI Plugin

High Availability Architecture

May 28, 2018 · Artificial Intelligence

Interview with GIAC AI Forum Lecturer Long Mingkang on Building AI Platforms, Speech Recognition Challenges, and Future AI Trends

In this interview, Long Mingkang, Vice President of iFlytek's Cloud Computing Institute, shares his experience building large‑scale speech cloud services, discusses the technical hurdles of speech recognition and AI platform development, compares TensorFlow and MXNet, and offers insights on AutoML, industry trends, and how engineers can master AI.

AIAI PlatformsAutoML

0 likes · 13 min read

Interview with GIAC AI Forum Lecturer Long Mingkang on Building AI Platforms, Speech Recognition Challenges, and Future AI Trends

Liulishuo Tech Team

Sep 3, 2017 · Artificial Intelligence

Report on Interspeech 2017 and SLaTE 2017: Highlights in Speech Recognition, Synthesis, and Speaker Technologies

The article reports on Liulishuo’s participation in Interspeech 2017 and the SLaTE 2017 workshop, summarizing key research papers on noise‑robust ASR, attention‑based models, TensorFlow training, modern TTS systems, speaker identification datasets, and includes a hiring announcement for AI engineers.

AIInterspeechSpeech Recognition

0 likes · 7 min read

Report on Interspeech 2017 and SLaTE 2017: Highlights in Speech Recognition, Synthesis, and Speaker Technologies

Alibaba Cloud Developer

Apr 1, 2017 · Artificial Intelligence

Boosting Online Speech Recognition with Improved Latency‑Controlled BLSTM Models

This article explains how improved latency‑controlled BLSTM acoustic models can boost online speech‑recognition accuracy while cutting decoding computation, detailing two model refinements that achieve 40‑60% speed gains with minimal loss in recognition performance.

Computational EfficiencyLC-BLSTMSpeech Recognition

0 likes · 6 min read

Boosting Online Speech Recognition with Improved Latency‑Controlled BLSTM Models

Alibaba Cloud Developer

Mar 17, 2017 · Artificial Intelligence

How Improved Latency‑Controlled BLSTM Models Boost Online Speech Recognition Efficiency

This article explains how latency‑controlled BLSTM acoustic models were refined to accelerate online speech recognition while preserving accuracy, detailing the training strategy, computational trade‑offs, and two model enhancements that achieve up to 60% faster decoding with modest resource savings.

EfficiencyLC-BLSTMSpeech Recognition

0 likes · 6 min read

How Improved Latency‑Controlled BLSTM Models Boost Online Speech Recognition Efficiency

Liulishuo Tech Team

Oct 28, 2016 · Artificial Intelligence

Open‑sourcing kaldi‑ctc: Fast GPU‑Accelerated CTC End‑to‑End Speech Recognition

The article announces the open‑source release of kaldi‑ctc, a GPU‑accelerated CTC‑based end‑to‑end speech recognition toolkit built on Kaldi, warp‑ctc and cuDNN, highlighting its 5‑6× training speedup, real‑time decoding factor of 0.02, and performance comparisons on the LibriSpeech corpus.

ASRCTCGPU

0 likes · 4 min read

Open‑sourcing kaldi‑ctc: Fast GPU‑Accelerated CTC End‑to‑End Speech Recognition

Alibaba Cloud Developer

Oct 11, 2016 · Artificial Intelligence

What Were the Key Speech AI Breakthroughs at Interspeech 2016?

The Interspeech 2016 conference in San Francisco showcased major advances in speech recognition, synthesis, far‑field processing, and language modeling, highlighting CTC extensions, deep CNN innovations, WaveNet’s generative audio, and new techniques for multi‑microphone acoustic modeling.

CTCInterspeech 2016Speech Recognition

0 likes · 7 min read

What Were the Key Speech AI Breakthroughs at Interspeech 2016?

Ctrip Technology

Aug 12, 2016 · Mobile Development

Design and Development of a Siri‑Like Voice‑Controlled Music iOS App

This article walks through the design and implementation of a voice‑controlled music iOS application using Siri SDK, Sketch and Principle for UI prototyping, and Xcode with Objective‑C and SpeechKit for speech recognition, culminating in a functional prototype that searches iTunes and plays song previews.

Mobile DevelopmentObjective‑CSiri SDK

0 likes · 8 min read

Design and Development of a Siri‑Like Voice‑Controlled Music iOS App

21CTO

Dec 9, 2015 · Artificial Intelligence

iFLY Mobile Speech Platform: Enabling Voice Recognition and Synthesis

iFLY’s Mobile Speech Platform (MSP) integrates cloud‑based speech recognition and text‑to‑speech technologies to deliver high‑quality, multi‑channel voice services for Android, iOS and other devices, detailing its four‑layer architecture, core functionalities, and the role of ASR and TTS in modern human‑machine interaction.

Artificial IntelligenceMobile DevelopmentSpeech Recognition

0 likes · 5 min read

iFLY Mobile Speech Platform: Enabling Voice Recognition and Synthesis