Overview of Prominent Large Language Models and Instruction‑Finetuned Variants
This article provides a comprehensive overview of major large language models—including GPT series, T5, LaMDA, LLaMA, BLOOM, and others—detailing their architectures, parameter scales, open‑source status, and the evolution of instruction‑fine‑tuning techniques that improve zero‑shot and few‑shot performance.
Introduction
Since the emergence of ChatGPT, a multitude of large language models (LLMs) have been released, making it difficult to track their origins, capabilities, and relationships. This article surveys notable LLMs to clarify their characteristics and aid further study.
1. Basic Language Models
Basic LLMs are pretrained on massive text corpora without instruction or downstream fine‑tuning. Most are decoder‑only (GPT‑style), though encoder‑decoder (T5‑style) and specialized structures (GLM‑style) also exist.
T5
Google’s T5 treats every NLP task as a text‑to‑text problem using an encoder‑decoder Transformer. By prefixing inputs with task‑specific prompts, the same model can handle diverse tasks, and a multilingual version (mT5) supports 101 languages.
GPT‑3
OpenAI’s GPT‑3 expanded model parameters to 175 B, introducing in‑context learning that enables strong zero‑, one‑, and few‑shot performance without additional fine‑tuning.
Other Notable Models
LaMDA (Google, 137 B) – dialogue‑focused, optimized for quality, safety, and groundedness.
Jurassic‑1 (AI21 Labs, 178 B/7 B) – comparable to GPT‑3, excels in zero‑ and few‑shot tasks.
MT‑NLG (Microsoft + NVIDIA, 530 B) – emphasizes memory‑ and compute‑efficient parallelism.
Gopher (DeepMind, up to 280 B) – strong on reading comprehension and fact‑checking.
Chinchilla (DeepMind, 70 B) – demonstrates optimal trade‑off between model size and training tokens.
PaLM (Google, 540 B) – decoder‑only, trained on 6 144 TPU v4 chips, achieves state‑of‑the‑art few‑shot results.
U‑PaLM – UL2R‑fine‑tuned PaLM with comparable performance at half the compute.
OPT (Meta) – open‑source series from 125 M to 175 B parameters.
LLaMA (Meta) – 7 B–65 B models, competitive with larger LLMs.
BLOOM (BigScience, 176 B) – multilingual, open‑source decoder‑only model.
GLM‑130B (Tsinghua + Zhipu AI) – bilingual (English‑Chinese) dense model.
ERNIE 3.0 Titan (Baidu, 260 B) – knowledge‑enhanced Chinese LLM.
2. Instruction‑Finetuned Language Models
Instruction tuning adds natural‑language task descriptions (prompts) to guide LLMs, improving zero‑shot generalization. The process typically involves fine‑tuning on large multi‑task datasets with human‑written prompts.
T0
Built on T5, T0 is fine‑tuned on a massive multitask collection, achieving zero‑shot performance surpassing GPT‑3 × 16 on several benchmarks.
FLAN
Google’s FLAN fine‑tunes LaMDA‑based models with instruction data, outperforming GPT‑3 175 B on most evaluated tasks.
Flan‑LM
Extends FLAN to T5, PaLM, and U‑PaLM, scaling instruction data to 1 836 tasks and demonstrating significant gains on few‑shot and chain‑of‑thought settings.
BLOOMZ & mT0
Apply multilingual instruction tuning to BLOOM and mT5, yielding strong zero‑shot results across languages.
GPT‑3.5 & ChatGPT
GPT‑3.5 series incorporates progressive instruction fine‑tuning and RLHF, culminating in ChatGPT, which aligns model outputs with human intent via reinforcement learning from human feedback.
GPT‑4
GPT‑4 adds multimodal (image‑text) capabilities while retaining a transformer‑based decoder‑only architecture, achieving human‑level performance on many professional benchmarks.
Alpaca & Variants
Stanford’s Alpaca fine‑tunes LLaMA‑7B using self‑generated instruction data, matching GPT‑3.5 performance with far fewer parameters; subsequent LoRA‑based adaptations enable rapid fine‑tuning on consumer GPUs.
ChatGLM & ERNIE Bot
ChatGLM builds on GLM‑130B with instruction fine‑tuning and RLHF, offering a Chinese‑English dialogue model; ERNIE Bot (Baidu) is a ChatGPT‑like system based on the ERNIE series.
Conclusion
The rapid proliferation of LLMs and instruction‑fine‑tuning techniques has dramatically improved zero‑shot and few‑shot capabilities across languages and tasks, while open‑source initiatives (e.g., OPT, LLaMA, BLOOM) foster broader research access.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.