Artificial Intelligence 28 min read

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

The paper proposes a bidirectional optimization framework that fine‑tunes the low‑resource NLLB‑200 translation model with LoRA using data generated by ChatGPT, while also translating low‑resource prompts with NLLB before feeding them to LLMs, thereby improving multilingual translation quality yet requiring careful validation of noisy synthetic data.

vivo Internet Technology

Feb 12, 2025

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

This article explores the NLLB translation model and ChatGPT in the context of low‑resource language applications, proposing a two‑way optimization strategy that leverages both models.

1. NLLB Background – NLLB (No Language Left Behind) is a Meta project aiming to provide a single model capable of translating over 200 languages, including many low‑resource languages. The core goal is to break language barriers and enable equitable information access.

1.1 NLLB‑200 Data – The model is trained on the Flores‑200 dataset, an extension of Flores‑101 that covers 200 languages. Data augmentation includes back‑translation and web‑crawled corpora. Example of a sentence‑pair entry:

{
    "translation": {
        "ace_Latn": "Gobnyan hana geupeukeucewa gata atawa geutinggai meunan mantong gata.",
        "ban_Latn": "Ida nenten jaga manggayang wiadin ngutang semeton."
    },
    "laser_score": 1.2499876022338867,
    "source_sentence_lid": 1.0000100135803223,
    "target_sentence_lid": 0.9991400241851807,
    "source_sentence_source": "paracrawl9_hieu",
    "target_sentence_url": "https://alkitab.mobi/tb/Ula/31/6/"
}

1.2 NLLB‑200 Tokenizer – A shared SentencePiece tokenizer with a vocabulary of 256,206 tokens is used for all languages. It can be accessed via the HuggingFace Transformers library:

from transformers import NllbTokenizer

tokenizer = NllbTokenizer.from_pretrained(
    "nllb/nllb-200-distilled-1.3B", src_lang="eng_Latn", tgt_lang="fra_Latn"
)
example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
expected_translation_french = "Le chef de l'ONU affirme qu'il n'y a pas de solution militaire en Syrie."
inputs = tokenizer(example_english_phrase, text_target=expected_translation_french, return_tensors="pt")

1.3 NLLB‑200 Model – The model follows a standard encoder‑decoder Transformer architecture with shared parameters across languages, enabling cross‑lingual transfer, multi‑task learning, and efficient handling of low‑resource languages.

1.4 Model Distillation – The full MoE version contains 54.5 B parameters, which is impractical for most hardware. A distilled version with 1.3 B parameters (≈13 % of the original) is provided for resource‑constrained environments.

2. NLLB vs. LLM – Both are sequence‑to‑sequence generators, but NLLB uses an encoder‑decoder architecture while many LLMs (e.g., GPT) are decoder‑only. NLLB excels at low‑resource translation, whereas LLMs are stronger in high‑resource language understanding and generation.

3. Fine‑tuning NLLB with LoRA

LoRA (Low‑Rank Adaptation) reduces the number of trainable parameters to ~0.34 % of the full model, enabling efficient fine‑tuning on downstream tasks. Example configuration:

from peft import get_peft_model, LoraConfig, TaskType

peft_config = LoraConfig(
    task_type=TaskType.SEQ_2_SEQ_LM,
    inference_mode=False,
    r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj", "fc_in", "fc_out", "wte"]
)
lora_model = get_peft_model(model, peft_config)
lora_model.print_trainable_parameters()
# trainable params: 4,718,592 || all params: 1,375,356,928 || trainable%: 0.34

Training steps include initializing the base NLLB‑200 model, applying LoRA, and saving the adapted model:

from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("nllb/nllb-200-distilled-1.3B")
# ... LoRA fine‑tuning ...
lora_model.save_pretrained('nllb_lora_model')

During inference, both the base model and LoRA weights are loaded:

from peft import PeftConfig, PeftModel
config = PeftConfig.from_pretrained("nllb_lora_model")
lora_model = PeftModel.from_pretrained(model, "nllb_lora_model")
output = lora_model.generate(**encoded_text, forced_bos_token_id=tokenizer.lang_code_to_id["eng_Latn"])

Experiments show that LoRA‑fine‑tuned NLLB produces concise, app‑name‑oriented queries, reducing unnecessary conjunctions and improving retrieval relevance.

4. Using NLLB Translations as LLM Prompts – Translating low‑resource prompts into English (or Chinese) before feeding them to LLMs like ChatGPT or Llama‑2 dramatically improves answer correctness, especially for languages with scarce training data (e.g., Māori).

Case study: a Māori prompt about fabric quantities is first translated to English, then answered correctly by ChatGPT, whereas direct processing fails.

5. Conclusion and Outlook – Leveraging LLM‑generated data to fine‑tune NLLB is effective, but generated knowledge may be noisy and requires validation. While translation can boost LLM performance on multilingual NLP tasks, native‑language prompts remain essential for culturally nuanced understanding. Continued development of multilingual LLMs is crucial.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM Transformer fine-tuning LoRA machine translation low-resource languages NLLB PEFT

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.