Artificial Intelligence 14 min read

MISA – Ant Financial’s AI Voice Service Assistant: Architecture, Deep‑Learning Models, and the AI Competition

The article introduces MISA, Ant Financial’s AI‑driven voice service assistant that uses deep‑learning models such as CNN and RNN for problem guessing, identification, and interactive clarification, details its system components and evaluation metrics, and describes the related AI competition focused on sentence‑similarity calculation.

AntTech
AntTech
AntTech
MISA – Ant Financial’s AI Voice Service Assistant: Architecture, Deep‑Learning Models, and the AI Competition

With mobile payments replacing wallets in China, Ant Financial’s 95188 hotline evolved from a traditional IVR menu to an AI‑powered voice service called MISA (Machine Intelligence Service Assistant), which aims to recognize user intent and provide personalized, efficient financial support.

According to product‑operation expert Yi Ke, the original goal was to let users describe their issues, match them against business knowledge, and later shift to a multi‑turn dialogue that asks users targeted questions, reducing the cognitive load on callers.

The system consists of three core modules: Guess Question , Problem Identification , and Clarifying Interaction . The “guess” module predicts likely user issues using deep‑learning models; the identification module locates the exact problem from the user’s description; the clarifying interaction fills missing information through a question‑and‑answer loop.

Inputs are categorized into three types: factors (business‑defined features such as recent repayment activity), trajectories (a time‑series of up to 120 user actions, over ten thousand possible actions), and text (speech‑to‑text transcriptions or historical user utterances).

Problem types number in the thousands, and responses are classified into three tiers: simple voice replies, in‑app page pushes for actionable tasks, or escalation to human agents when the issue is complex.

On average, the system resolves queries within 1.8–1.9 dialogue turns, after limiting the maximum rounds to four before handing over to a human.

For natural‑language processing, the team first encodes text with custom word embeddings, then processes it with both a Convolutional Neural Network (CNN) and an LSTM‑based Recurrent Neural Network (RNN). The CNN captures keyword features, while the RNN models sequential dependencies; their outputs are summed and combined with factor and trajectory vectors for final classification.

Multiple sub‑models (including classification, search, and intent‑tree models) feed into a GBDT‑based fusion model that selects the most confident prediction, enabling a weekly two‑iteration model update cycle.

The competition “ATEC Ant AI Contest – Similarity Calculation in Customer Service” invites participants to develop methods for determining whether two sentences are synonymous, a core task for matching user queries to knowledge‑base entries. Over 2,000 teams have registered, and the challenge provides a 100 k‑record dataset for training similarity models.

Organizers emphasize that similarity calculation is a fundamental, data‑light approach suitable for many low‑resource scenarios, and successful methods may be deployed back into Ant’s production systems to improve real‑world customer service.

AIDeep Learningcustomer serviceNatural Language ProcessingcompetitionVoice Assistantsimilarity calculation
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.