Artificial Intelligence 24 min read

Overview of Prominent Large Language Models and Instruction‑Finetuned Variants

This article provides a comprehensive overview of major large language models—including GPT series, T5, LaMDA, LLaMA, BLOOM, and others—detailing their architectures, parameter scales, open‑source status, and the evolution of instruction‑fine‑tuning techniques that improve zero‑shot and few‑shot performance.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Overview of Prominent Large Language Models and Instruction‑Finetuned Variants

Introduction

Since the emergence of ChatGPT, a multitude of large language models (LLMs) have been released, making it difficult to track their origins, capabilities, and relationships. This article surveys notable LLMs to clarify their characteristics and aid further study.

1. Basic Language Models

Basic LLMs are pretrained on massive text corpora without instruction or downstream fine‑tuning. Most are decoder‑only (GPT‑style), though encoder‑decoder (T5‑style) and specialized structures (GLM‑style) also exist.

T5

Google’s T5 treats every NLP task as a text‑to‑text problem using an encoder‑decoder Transformer. By prefixing inputs with task‑specific prompts, the same model can handle diverse tasks, and a multilingual version (mT5) supports 101 languages.

GPT‑3

OpenAI’s GPT‑3 expanded model parameters to 175 B, introducing in‑context learning that enables strong zero‑, one‑, and few‑shot performance without additional fine‑tuning.

Other Notable Models

LaMDA (Google, 137 B) – dialogue‑focused, optimized for quality, safety, and groundedness.

Jurassic‑1 (AI21 Labs, 178 B/7 B) – comparable to GPT‑3, excels in zero‑ and few‑shot tasks.

MT‑NLG (Microsoft + NVIDIA, 530 B) – emphasizes memory‑ and compute‑efficient parallelism.

Gopher (DeepMind, up to 280 B) – strong on reading comprehension and fact‑checking.

Chinchilla (DeepMind, 70 B) – demonstrates optimal trade‑off between model size and training tokens.

PaLM (Google, 540 B) – decoder‑only, trained on 6 144 TPU v4 chips, achieves state‑of‑the‑art few‑shot results.

U‑PaLM – UL2R‑fine‑tuned PaLM with comparable performance at half the compute.

OPT (Meta) – open‑source series from 125 M to 175 B parameters.

LLaMA (Meta) – 7 B–65 B models, competitive with larger LLMs.

BLOOM (BigScience, 176 B) – multilingual, open‑source decoder‑only model.

GLM‑130B (Tsinghua + Zhipu AI) – bilingual (English‑Chinese) dense model.

ERNIE 3.0 Titan (Baidu, 260 B) – knowledge‑enhanced Chinese LLM.

2. Instruction‑Finetuned Language Models

Instruction tuning adds natural‑language task descriptions (prompts) to guide LLMs, improving zero‑shot generalization. The process typically involves fine‑tuning on large multi‑task datasets with human‑written prompts.

T0

Built on T5, T0 is fine‑tuned on a massive multitask collection, achieving zero‑shot performance surpassing GPT‑3 × 16 on several benchmarks.

FLAN

Google’s FLAN fine‑tunes LaMDA‑based models with instruction data, outperforming GPT‑3 175 B on most evaluated tasks.

Flan‑LM

Extends FLAN to T5, PaLM, and U‑PaLM, scaling instruction data to 1 836 tasks and demonstrating significant gains on few‑shot and chain‑of‑thought settings.

BLOOMZ & mT0

Apply multilingual instruction tuning to BLOOM and mT5, yielding strong zero‑shot results across languages.

GPT‑3.5 & ChatGPT

GPT‑3.5 series incorporates progressive instruction fine‑tuning and RLHF, culminating in ChatGPT, which aligns model outputs with human intent via reinforcement learning from human feedback.

GPT‑4

GPT‑4 adds multimodal (image‑text) capabilities while retaining a transformer‑based decoder‑only architecture, achieving human‑level performance on many professional benchmarks.

Alpaca & Variants

Stanford’s Alpaca fine‑tunes LLaMA‑7B using self‑generated instruction data, matching GPT‑3.5 performance with far fewer parameters; subsequent LoRA‑based adaptations enable rapid fine‑tuning on consumer GPUs.

ChatGLM & ERNIE Bot

ChatGLM builds on GLM‑130B with instruction fine‑tuning and RLHF, offering a Chinese‑English dialogue model; ERNIE Bot (Baidu) is a ChatGPT‑like system based on the ERNIE series.

Conclusion

The rapid proliferation of LLMs and instruction‑fine‑tuning techniques has dramatically improved zero‑shot and few‑shot capabilities across languages and tasks, while open‑source initiatives (e.g., OPT, LLaMA, BLOOM) foster broader research access.

Large Language ModelsAI researchinstruction tuningopen‑source modelsLLM comparison
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.