DeepSpeed Chat: An Open‑Source Framework for Scalable RLHF Training of ChatGPT‑Style Models
DeepSpeed Chat provides a fast, affordable, and scalable system for end‑to‑end RLHF training of ChatGPT‑style large language models, offering one‑click scripts, detailed performance benchmarks across GPU configurations, support for many model families, and a flexible API for custom RLHF pipelines.
DeepSpeed Chat is a general‑purpose system framework that democratizes the training of ChatGPT‑style models by integrating a three‑stage OpenAI InstructGPT pipeline with Reinforcement Learning from Human Feedback (RLHF). It can automatically use a preferred pretrained LLM and produce high‑quality ChatGPT‑style models at various scales.
The framework enables rapid, low‑cost training: a 1.3B model can be trained on a single NVIDIA A6000 (48 GB) in 1.36 hours, a 13B model on an 8‑GPU DGX node in 13.6 hours, and a 66B model on a 64‑GPU cluster in about 9 hours, achieving up to 15× speed‑up over existing RLHF systems.
DeepSpeed Chat also serves as the backend for accelerating other chat‑style model projects such as LLaMA, and it supports several open‑source examples including Databricks Dolly, LMFlow, CarperAI‑TRLX, and HuggingFace‑PEFT.
Features
Complete end‑to‑end three‑stage RLHF training pipeline built on DeepSpeed’s ZeRO and inference optimizations.
Hybrid engine that delivers fast, economical, and scalable RLHF training.
One‑script experience for supervised fine‑tuning, reward model training, and PPO‑based RLHF.
System‑level support for both full RLHF pipelines and single‑step fine‑tuning of pretrained actors or reward models.
Quick Start
Installation steps:
git clone https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed
pip install .
git clone https://github.com/microsoft/DeepSpeedExamples.git
cd DeepSpeedExamples/applications/DeepSpeed-Chat/
pip install -r requirements.txtExample RLHF Training Commands
1.3B model (single GPU):
python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --num-gpus 113B model (8‑GPU DGX node):
python train.py --actor-model facebook/opt-13b --reward-model facebook/opt-350m --num-gpus 866B model (64 GPUs):
python train.py --actor-model facebook/opt-66b --reward-model facebook/opt-350m --num-gpus 64Performance Evaluation
Training time tables for various GPU SKUs and model sizes are provided, showing that DeepSpeed‑Chat can train a 1.3B model in 31 minutes on 64 × A100‑80G GPUs, and a 66B model in 7.5 hours on the same configuration, with cost estimates for Azure cloud.
Compared with Colossal‑AI Coati and HuggingFace DDP, DeepSpeed‑Chat achieves 10× higher throughput on single‑GPU setups and 6‑19× speed‑up on multi‑GPU nodes for the most expensive RLHF step.
Supported Model Families
Model Family
Size Range
opt
0.1B – 66B
bloom
0.3B – 176B
gpt_neox
1.3B – 20B
gptj
1.4B – 6B
gpt_neo
0.1B – 2.7B
gpt2
0.3B – 1.5B
codegen
0.35B – 16B
Custom RLHF API
engine = DeepSpeedRLHFEngine(
actor_model_name_or_path=args.actor_model_name_or_path,
critic_model_name_or_path=args.critic_model_name_or_path,
tokenizer=tokenizer,
num_total_iters=num_total_iters,
args=args)
trainer = DeepSpeedPPOTrainer(engine=engine, args=args)
for prompt_batch in prompt_train_dataloader:
out = trainer.generate_experience(prompt_batch)
actor_loss, critic_loss = trainer.train_rlhf(out)References
Schulman, J. et al., “Introducing ChatGPT”, openai.com/blog/chatgpt (2022).
Ouyang, L. et al., “Training language models to follow instructions with human feedback”, arXiv:2203.02155 (2022).
Stiennon, N. et al., “Learning to summarise with human feedback”, NeurIPS 33 (2020): 3008‑3021.
Transformers – Hugging Face (github.com).
CarperAI, https://github.com/CarperAI/trlx.
lvwerra/trl: Train transformer language models with reinforcement learning (github.com).
pg‑is‑all‑you‑need/02.PPO.ipynb, MrSyee/pg‑is‑all‑you‑need (github.com).
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.