Artificial Intelligence 14 min read

DeepSpeed Chat: An Open‑Source Framework for Scalable RLHF Training of ChatGPT‑Style Models

DeepSpeed Chat provides a fast, affordable, and scalable system for end‑to‑end RLHF training of ChatGPT‑style large language models, offering one‑click scripts, detailed performance benchmarks across GPU configurations, support for many model families, and a flexible API for custom RLHF pipelines.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
DeepSpeed Chat: An Open‑Source Framework for Scalable RLHF Training of ChatGPT‑Style Models

DeepSpeed Chat is a general‑purpose system framework that democratizes the training of ChatGPT‑style models by integrating a three‑stage OpenAI InstructGPT pipeline with Reinforcement Learning from Human Feedback (RLHF). It can automatically use a preferred pretrained LLM and produce high‑quality ChatGPT‑style models at various scales.

The framework enables rapid, low‑cost training: a 1.3B model can be trained on a single NVIDIA A6000 (48 GB) in 1.36 hours, a 13B model on an 8‑GPU DGX node in 13.6 hours, and a 66B model on a 64‑GPU cluster in about 9 hours, achieving up to 15× speed‑up over existing RLHF systems.

DeepSpeed Chat also serves as the backend for accelerating other chat‑style model projects such as LLaMA, and it supports several open‑source examples including Databricks Dolly, LMFlow, CarperAI‑TRLX, and HuggingFace‑PEFT.

Features

Complete end‑to‑end three‑stage RLHF training pipeline built on DeepSpeed’s ZeRO and inference optimizations.

Hybrid engine that delivers fast, economical, and scalable RLHF training.

One‑script experience for supervised fine‑tuning, reward model training, and PPO‑based RLHF.

System‑level support for both full RLHF pipelines and single‑step fine‑tuning of pretrained actors or reward models.

Quick Start

Installation steps:

git clone https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed
pip install .

git clone https://github.com/microsoft/DeepSpeedExamples.git
cd DeepSpeedExamples/applications/DeepSpeed-Chat/
pip install -r requirements.txt

Example RLHF Training Commands

1.3B model (single GPU):

python train.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --num-gpus 1

13B model (8‑GPU DGX node):

python train.py --actor-model facebook/opt-13b --reward-model facebook/opt-350m --num-gpus 8

66B model (64 GPUs):

python train.py --actor-model facebook/opt-66b --reward-model facebook/opt-350m --num-gpus 64

Performance Evaluation

Training time tables for various GPU SKUs and model sizes are provided, showing that DeepSpeed‑Chat can train a 1.3B model in 31 minutes on 64 × A100‑80G GPUs, and a 66B model in 7.5 hours on the same configuration, with cost estimates for Azure cloud.

Compared with Colossal‑AI Coati and HuggingFace DDP, DeepSpeed‑Chat achieves 10× higher throughput on single‑GPU setups and 6‑19× speed‑up on multi‑GPU nodes for the most expensive RLHF step.

Supported Model Families

Model Family

Size Range

opt

0.1B – 66B

bloom

0.3B – 176B

gpt_neox

1.3B – 20B

gptj

1.4B – 6B

gpt_neo

0.1B – 2.7B

gpt2

0.3B – 1.5B

codegen

0.35B – 16B

Custom RLHF API

engine = DeepSpeedRLHFEngine(
    actor_model_name_or_path=args.actor_model_name_or_path,
    critic_model_name_or_path=args.critic_model_name_or_path,
    tokenizer=tokenizer,
    num_total_iters=num_total_iters,
    args=args)

trainer = DeepSpeedPPOTrainer(engine=engine, args=args)

for prompt_batch in prompt_train_dataloader:
    out = trainer.generate_experience(prompt_batch)
    actor_loss, critic_loss = trainer.train_rlhf(out)

References

Schulman, J. et al., “Introducing ChatGPT”, openai.com/blog/chatgpt (2022).

Ouyang, L. et al., “Training language models to follow instructions with human feedback”, arXiv:2203.02155 (2022).

Stiennon, N. et al., “Learning to summarise with human feedback”, NeurIPS 33 (2020): 3008‑3021.

Transformers – Hugging Face (github.com).

CarperAI, https://github.com/CarperAI/trlx.

lvwerra/trl: Train transformer language models with reinforcement learning (github.com).

pg‑is‑all‑you‑need/02.PPO.ipynb, MrSyee/pg‑is‑all‑you‑need (github.com).

large language modelsPerformance BenchmarkChatGPTRLHFDeepSpeedGPU trainingOpenAI InstructGPT
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.