Artificial Intelligence 13 min read

Parameter-Efficient Sparsity Training for the PLUG Large-Scale Language Model

This article presents the PLUG 270‑billion‑parameter Chinese language model and introduces a parameter‑efficient sparsity training (PST) framework that combines unstructured and structured pruning with low‑rank decomposition to dramatically reduce model size while preserving downstream performance.

DataFunTalk
DataFunTalk
DataFunTalk
Parameter-Efficient Sparsity Training for the PLUG Large-Scale Language Model

In March 2021, Alibaba DAMO Academy released PLUG, a 27‑billion‑parameter Chinese language understanding and generation model. Deploying such massive models is challenging due to memory consumption and inference latency.

The authors propose a sparse training solution, Parameter‑Efficient Sparsity Training (PST) , which includes two key components:

Sparse Tuning : learns a sparse sub‑network from a small amount of data, achieving fine‑tuning quality comparable to the full model.

Sparse Serving : builds an efficient inference engine that leverages sparse matrix operations instead of dense kernels.

Related work on model compression is reviewed, covering weight sharing, knowledge distillation, low‑rank factorization, pruning, and quantization. Evaluation metrics such as FLOPs, inference time, speed‑up ratio, parameter count, model size, carbon footprint, fidelity, and robustness are discussed.

Experiments on NLU (GLUE/CLUE) and NLG benchmarks demonstrate that PST can prune up to 99% of parameters while maintaining or even improving accuracy compared with magnitude pruning, movement pruning, and L0 regularization baselines. Detailed analyses show smoother weight and score distributions and favorable sparsity patterns.

The paper concludes that PST enables practical deployment of ultra‑large language models like PLUG, achieving high sparsity with minimal performance loss, and outlines future directions including faster sparsity algorithms, integration with distillation and quantization, and hardware‑aware acceleration.

A Q&A section addresses practical concerns such as model export, hardware support for sparsity, and the readiness of PST on Alibaba’s AliceMind platform.

deep learningmodel compressionlarge language modelssparsityparameter-efficient trainingPLUG
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.