Artificial Intelligence 13 min read

DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation and Zero‑Shot Transfer

DeltaLM is a new multilingual pretrained encoder‑decoder model that leverages a pretrained encoder and a novel decoder to improve multilingual neural machine translation, offering efficient training, strong cross‑language transfer, zero‑shot translation, and superior performance on various translation and summarization tasks.

DataFunTalk
DataFunTalk
DataFunTalk
DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation and Zero‑Shot Transfer

Introduction

Multilingual neural machine translation (MNMT) has attracted increasing research interest because pretrained multilingual models can greatly reduce annotation and training costs while enhancing cross‑language transfer. DeltaLM is proposed as a new multilingual pretrained model built on an encoder‑decoder architecture that inherits the cross‑language abilities of a pretrained encoder.

Key Topics Covered

The presentation reviews the machine‑translation roadmap, describes the MNMT framework, introduces the DeltaLM pretrained model, explains how it integrates with NMT, and discusses zero‑shot cross‑language transfer.

Training Data and Sampling

Training corpora consist of fused multilingual sentence‑pair data, with varying scales across language directions. A sampling strategy balances the data to ensure fair representation of high‑resource and low‑resource language pairs.

DeltaLM Architecture

DeltaLM combines a pretrained encoder (e.g., XLM‑R) with a newly designed interleaved decoder that fully utilizes the encoder’s parameters. This design reduces training cost, preserves the encoder’s cross‑language knowledge, and decouples encoder and decoder for easier fine‑tuning.

Pretraining Tasks

Two pretraining objectives are used: (1) Span Corruption (T5 style) on monolingual text, and (2) Translation‑Pair Span Corruption on bilingual data, which masks spans across language pairs to learn alignment.

Two‑Stage Fine‑Tuning

Stage 1 fixes the encoder and embedding layers while fine‑tuning the decoder on bilingual data, preserving cross‑language transfer. Stage 2 unfreezes the encoder and continues fine‑tuning both encoder and decoder, optionally removing self‑attention residual connections to further improve language‑agnostic representations.

Experimental Results

DeltaLM achieves competitive or superior BLEU scores on multilingual translation benchmarks (e.g., 101 languages with FB‑m2m) despite using far fewer parameters than models like mT5‑XL. It also excels in cross‑language summarization (WikiLingua) and text‑generation tasks, demonstrating strong zero‑shot capabilities.

Language Transfer Findings

Experiments show that languages within the same family transfer more effectively, suggesting that a single high‑resource language can benefit the entire language family, reducing the need for extensive parallel data.

Conclusion

DeltaLM’s pretrained encoder‑decoder architecture and novel pretraining tasks provide powerful cross‑language transfer and generation abilities, enabling efficient multilingual NMT and zero‑shot translation with significantly lower data and parameter requirements.

Zero-shotmultilingualmachine translationDeltaLMNMTpretrained model
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.