Artificial Intelligence 12 min read

DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation

DeltaLM is a multilingual pretrained encoder‑decoder model that leverages cross‑lingual transfer from a pretrained encoder and novel decoder architecture, employs span‑corruption and translation‑pair pretraining tasks, and uses a two‑stage fine‑tuning strategy to achieve strong zero‑shot and supervised translation performance across over 100 languages.

DataFunSummit
DataFunSummit
DataFunSummit
DeltaLM: A Multilingual Pretrained Encoder‑Decoder Model for Neural Machine Translation

DeltaLM is a new multilingual pretrained encoder‑decoder model designed to improve neural machine translation (NMT) by leveraging the cross‑lingual transfer ability of pretrained encoders.

The model combines a pretrained encoder (e.g., XLM‑R) with a novel interleaved decoder, enabling full reuse of encoder parameters and efficient training.

Two pretraining tasks are used: Span Corruption (T5‑style) on monolingual data and Translation Pair Span Corruption on bilingual data, allowing the model to learn both language modeling and cross‑language alignment.

A two‑stage fine‑tuning strategy is proposed: first freeze the encoder and fine‑tune the decoder on bilingual data, then jointly fine‑tune encoder and decoder while removing self‑attention residual connections to enhance language‑agnostic representations.

Extensive experiments on 100+ languages demonstrate that DeltaLM achieves competitive or superior performance to larger models (e.g., mT5, MT‑5) on multilingual MT, cross‑lingual summarization, and zero‑shot translation, while using significantly fewer parameters.

Conclusions: multilingual pretrained models can greatly reduce annotation and training costs for NMT and improve zero‑shot cross‑language transfer, with DeltaLM’s architecture and training objectives providing strong cross‑lingual generation capabilities.

Zero-shotpretrained modelsneural machine translationmultilingual translationcross-lingual transferDeltaLM
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.