Paradigm Shifts in Large Language Models: From Pre‑training to AGI and Future Research Directions

The article reviews the evolution of large language models, highlighting two major paradigm shifts after GPT‑3, the role of scaling laws, knowledge acquisition, prompting techniques, reasoning abilities, and outlines future research priorities for building more capable and efficient AI systems.

AI reasoningIn-Context LearningModel Scaling

0 likes · 71 min read

Paradigm Shifts in Large Language Models: From Pre‑training to AGI and Future Research Directions

Alibaba Cloud Big Data AI Platform

Jan 10, 2023 · Artificial Intelligence

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

This article examines the use of Mixture‑of‑Experts (MoE) sparse training for GPT models, detailing the architecture, training and inference efficiency gains, experimental comparisons with dense models, custom routing algorithms, and step‑by‑step deployment on Alibaba Cloud AI platforms.

AI efficiencyGPT-MoELarge Language Models

0 likes · 26 min read

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights