Artificial Intelligence 11 min read

Architecture and Design of an AI‑Powered Voice Robot System

The article describes the design and implementation of a voice robot platform, covering its background, layered architecture, dialogue flow, intent recognition techniques, micro‑service backend, and future improvements, highlighting how AI models and telephony integration enable automated multi‑turn voice interactions for sales and service scenarios.

58 Tech

May 28, 2019

Architecture and Design of an AI‑Powered Voice Robot System

Background – The voice robot, developed by 58 Group’s TEG AI Lab, provides automated outbound calls, multi‑turn voice interaction, and intelligent intent judgment, applicable to sales, product promotion, information verification, and notifications, dramatically improving sales efficiency compared with manual agents.

Overall Architecture – The system consists of a foundational service layer (SIP signaling, voice codec, speech recognition/synthesis via third‑party providers), a logic layer (intent recognition, dialogue interaction, dialogue management), an editing/operation layer (data labeling, evaluation, analytics), an access layer (real‑time outbound interface, WMB message bus, result callback), and a web portal for account, script, task, and reporting management.

Dialogue Interaction – Calls follow a script‑driven directed acyclic graph with nodes for opening, closing, and business dialogue, and edges representing positive or negative logic. Scripts can be executed via offline batch outbound or online real‑time outbound through SCF APIs.

Intent Recognition – The robot distinguishes single‑utterance intents (19 predefined labels such as affirmative, negative, request, etc.) and whole‑call intents (e.g., successful appointment, neutral, explicit rejection). It also handles standard out‑of‑script questions and context‑dependent meanings using deep‑learning models (TextCNN, DSSM, LSTM, BERT) and error‑mitigation techniques, achieving about 85% accuracy.

System Serviceization – The backend follows a micro‑service architecture built on 58’s proprietary RPC framework (SCF). Key services include real‑time outbound access, SIP resource scheduling, speech segmentation, speech recognition, dialogue management, and three intent‑recognition services. Intent services are trained offline on WPAI and served online for prediction.

Summary – The voice robot provides a generic outbound voice capability for various business scenarios, already integrated with sales, customer service, and operations. Future work will focus on improving outbound effectiveness, simplifying integration, raising intent‑recognition accuracy, and enhancing dialogue fluency to boost efficiency in traditionally labor‑intensive domains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

microservices Speech AI dialogue system Telephony voice bot

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.