Architecture and Design of an AI‑Powered Voice Robot System
The article describes the design and implementation of a voice robot platform, covering its background, layered architecture, dialogue flow, intent recognition techniques, micro‑service backend, and future improvements, highlighting how AI models and telephony integration enable automated multi‑turn voice interactions for sales and service scenarios.
Background – The voice robot, developed by 58 Group’s TEG AI Lab, provides automated outbound calls, multi‑turn voice interaction, and intelligent intent judgment, applicable to sales, product promotion, information verification, and notifications, dramatically improving sales efficiency compared with manual agents.
Overall Architecture – The system consists of a foundational service layer (SIP signaling, voice codec, speech recognition/synthesis via third‑party providers), a logic layer (intent recognition, dialogue interaction, dialogue management), an editing/operation layer (data labeling, evaluation, analytics), an access layer (real‑time outbound interface, WMB message bus, result callback), and a web portal for account, script, task, and reporting management.
Dialogue Interaction – Calls follow a script‑driven directed acyclic graph with nodes for opening, closing, and business dialogue, and edges representing positive or negative logic. Scripts can be executed via offline batch outbound or online real‑time outbound through SCF APIs.
Intent Recognition – The robot distinguishes single‑utterance intents (19 predefined labels such as affirmative, negative, request, etc.) and whole‑call intents (e.g., successful appointment, neutral, explicit rejection). It also handles standard out‑of‑script questions and context‑dependent meanings using deep‑learning models (TextCNN, DSSM, LSTM, BERT) and error‑mitigation techniques, achieving about 85% accuracy.
System Serviceization – The backend follows a micro‑service architecture built on 58’s proprietary RPC framework (SCF). Key services include real‑time outbound access, SIP resource scheduling, speech segmentation, speech recognition, dialogue management, and three intent‑recognition services. Intent services are trained offline on WPAI and served online for prediction.
Summary – The voice robot provides a generic outbound voice capability for various business scenarios, already integrated with sales, customer service, and operations. Future work will focus on improving outbound effectiveness, simplifying integration, raising intent‑recognition accuracy, and enhancing dialogue fluency to boost efficiency in traditionally labor‑intensive domains.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.