Artificial Intelligence 14 min read

MISA – Ant Financial’s AI Voice Service Assistant: Architecture, Deep‑Learning Models, and the AI Competition

The article introduces MISA, Ant Financial’s AI‑driven voice service assistant that uses deep‑learning models such as CNN and RNN for problem guessing, identification, and interactive clarification, details its system components and evaluation metrics, and describes the related AI competition focused on sentence‑similarity calculation.

AntTech

May 10, 2018

MISA – Ant Financial’s AI Voice Service Assistant: Architecture, Deep‑Learning Models, and the AI Competition

With mobile payments replacing wallets in China, Ant Financial’s 95188 hotline evolved from a traditional IVR menu to an AI‑powered voice service called MISA (Machine Intelligence Service Assistant), which aims to recognize user intent and provide personalized, efficient financial support.

According to product‑operation expert Yi Ke, the original goal was to let users describe their issues, match them against business knowledge, and later shift to a multi‑turn dialogue that asks users targeted questions, reducing the cognitive load on callers.

The system consists of three core modules: Guess Question , Problem Identification , and Clarifying Interaction . The “guess” module predicts likely user issues using deep‑learning models; the identification module locates the exact problem from the user’s description; the clarifying interaction fills missing information through a question‑and‑answer loop.

Inputs are categorized into three types: factors (business‑defined features such as recent repayment activity), trajectories (a time‑series of up to 120 user actions, over ten thousand possible actions), and text (speech‑to‑text transcriptions or historical user utterances).

Problem types number in the thousands, and responses are classified into three tiers: simple voice replies, in‑app page pushes for actionable tasks, or escalation to human agents when the issue is complex.

On average, the system resolves queries within 1.8–1.9 dialogue turns, after limiting the maximum rounds to four before handing over to a human.

For natural‑language processing, the team first encodes text with custom word embeddings, then processes it with both a Convolutional Neural Network (CNN) and an LSTM‑based Recurrent Neural Network (RNN). The CNN captures keyword features, while the RNN models sequential dependencies; their outputs are summed and combined with factor and trajectory vectors for final classification.

Multiple sub‑models (including classification, search, and intent‑tree models) feed into a GBDT‑based fusion model that selects the most confident prediction, enabling a weekly two‑iteration model update cycle.

The competition “ATEC Ant AI Contest – Similarity Calculation in Customer Service” invites participants to develop methods for determining whether two sentences are synonymous, a core task for matching user queries to knowledge‑base entries. Over 2,000 teams have registered, and the challenge provides a 100 k‑record dataset for training similarity models.

Organizers emphasize that similarity calculation is a fundamental, data‑light approach suitable for many low‑resource scenarios, and successful methods may be deployed back into Ant’s production systems to improve real‑world customer service.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI deep learning Customer Service natural language processing competition Voice Assistant similarity calculation

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.