End‑to‑End vs Agentic Approaches for Visual Language Navigation: Pros, Cons, and a Hybrid Roadmap

Both end‑to‑end and agentic visual‑language‑navigation systems have distinct strengths and weaknesses; the former excels in closed‑distribution efficiency while the latter offers modularity, explainability, and scalability, and a hybrid design can combine fast reflexes with high‑level planning for robust navigation.

Hybrid Architectureagentic systemend-to-end model

0 likes · 4 min read

End‑to‑End vs Agentic Approaches for Visual Language Navigation: Pros, Cons, and a Hybrid Roadmap

DataFunTalk

Nov 5, 2021 · Artificial Intelligence

End-to-End Entity Extraction for Tmall Genie: Speech2Slot Model and Unsupervised Pre‑Training

This article presents the business background of Tmall Genie’s voice‑driven content‑on‑demand service, critiques the traditional pipeline for entity extraction, and details an end‑to‑end speech‑semantic model—including the Speech2Slot architecture, knowledge‑enhanced encoding, and Phoneme‑BERT unsupervised pre‑training—demonstrating significant performance gains in both generation and classification tasks.

Speech RecognitionVoice Assistantend-to-end model

0 likes · 14 min read

End-to-End Entity Extraction for Tmall Genie: Speech2Slot Model and Unsupervised Pre‑Training

DataFunSummit

Nov 3, 2021 · Artificial Intelligence

Innovations and Practices of Entity Extraction in Tmall Genie Voice Assistant

The article presents Tmall Genie’s end‑to‑end speech‑semantic understanding pipeline, detailing the limitations of traditional ASR‑NLU‑IR pipelines, introducing the Speech2Slot model with knowledge‑enhanced encoders, and describing unsupervised phoneme‑based pre‑training (Phoneme‑BERT) that improves entity extraction performance in voice‑driven content playback.

Phoneme-BERTSpeech RecognitionTmall Genie

0 likes · 14 min read

Innovations and Practices of Entity Extraction in Tmall Genie Voice Assistant

Douyu Streaming

Oct 15, 2021 · Artificial Intelligence

How End-to-End Deep Learning Boosts Real-Time Speech Enhancement

An end‑to‑end deep‑learning framework for speech enhancement is presented, detailing dataset creation, time‑domain feature extraction, a convolutional separation network, decoding, and training strategies using SI‑SIR loss with PIT, achieving a final SI‑SIR of 13 dB.

PITSI-SIRdeep learning

0 likes · 9 min read

How End-to-End Deep Learning Boosts Real-Time Speech Enhancement

58 Tech

Jul 14, 2021 · Artificial Intelligence

Multi‑Turn Voice Bot Architecture and End‑to‑End Dialogue Jump Strategies at 58.com

This article describes the overall architecture of 58.com’s multi‑turn voice robot, explains rule‑based, intent‑based and text‑matching dialogue jump strategies, introduces an end‑to‑end classification approach using TextCNN, and reports its online performance improvements and future research directions.

AISpeech Recognitiondialogue management

0 likes · 17 min read

Multi‑Turn Voice Bot Architecture and End‑to‑End Dialogue Jump Strategies at 58.com