Artificial Intelligence 13 min read

Seq2Seq Approaches for Phone Number Extraction from Two‑Speaker Voice Dialogues

This article presents a practical study of extracting phone numbers from two‑speaker voice dialogues using Seq2Seq models—including LSTM, GRU with attention and feature fusion, and Transformer—detailing data characteristics, model architectures, training strategies, experimental results, and comparative analysis showing the GRU‑Attention approach achieving the best performance.

58 Tech
58 Tech
58 Tech
Seq2Seq Approaches for Phone Number Extraction from Two‑Speaker Voice Dialogues

Introduction This work introduces a practice of extracting phone numbers from two‑speaker voice dialogue transcripts using Seq2Seq methods. Different model structures (LSTM, GRU, Transformer) are compared, and various embedding, attention, and beam‑search techniques are explored. The Seq2Seq approach improves F1 by about 30% over traditional NER methods.

Background 58.com, a large Chinese life‑service platform, contains massive B‑to‑C voice conversations where customers provide personal information such as phone numbers. Verifying the spoken number against the registered one is crucial for detecting abnormal users and improving operational efficiency. The task is split into ASR (already handled by an in‑house engine) and information extraction, the latter being the focus of this study.

Data Characteristics Manual inspection of dialogue texts reveals that phone numbers often appear mixed with colloquial speech, making extraction challenging. The dataset is pre‑processed by retaining only segments that mention phone numbers, resulting in 45,376 training, 5,671 validation, and 5,673 test instances.

Solution Overview Three baseline methods are evaluated: (1) regular expressions (F1 ≈ 35.98 %), (2) plain NER (F1 ≈ 37.49 %), and (3) Seq2Seq models. Seq2Seq is chosen for its ability to capture long‑range and sequential patterns.

Seq2Seq Practice The classic Encoder‑Decoder architecture is used. The Encoder encodes the input token sequence (character‑level) into a context vector via CNN/RNN/Transformer feature extractors. The Decoder generates the output token by token, constrained to 11 digits and a vocabulary of 0‑9. Teacher‑forcing (ratio 0.5) and beam search (k = 5) are applied to improve training stability and decoding quality.

Model 1: LSTM‑Based Seq2Seq The Encoder uses two‑layer bidirectional LSTM (embedding = 256, hidden = 512); the Decoder uses two‑layer unidirectional LSTM. Teacher‑forcing mitigates error propagation. Initial results are modest; subsequent optimizations include beam search, limiting the input vocabulary to phone‑related tokens, and adding role embeddings. Improvements are limited (≈1‑5 % absolute gain).

Model 2: GRU + Attention + Feature Fusion The Encoder employs a GRU; attention is the standard alignment‑based mechanism (scoring, softmax normalization, weighted sum). In the Decoder, the context vector, previous output token, and current hidden state are concatenated before feeding into the GRU and a final fully‑connected layer, enabling richer feature interaction. Beam search (k = 5) is also used. This model yields a large accuracy boost, outperforming Model 1 by several percentage points.

Model 3: Transformer‑Based Seq2Seq A standard Transformer encoder‑decoder with self‑attention is applied. After tuning, it improves over the LSTM baseline but still lags behind the GRU‑Attention model, likely due to limited training data and the relatively simple nature of the task.

Conclusion and Outlook The study demonstrates that a GRU encoder with alignment‑based attention and multi‑feature fusion provides the most effective solution for phone‑number extraction in dialogue text. Future work will refine preprocessing, combine rule‑based and model‑based methods, and explore larger datasets to further boost performance.

References

1. Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. NIPS 2014. 2. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv 2014. 3. Chung J et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014. 4. Vaswani A et al. Attention is all you need. NIPS 2017. 5. https://www.zhihu.com/question/302392659

Author

Gao Zhengjian, AI Lab Algorithm Engineer at 58.com. MSc from Capital Normal University, June 2020. Works on voice‑semantic tagging, intelligent writing, and related AI applications.

TransformerAttentionNLPGRUseq2seqLSTMPhone Number Extraction
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.