Tagged articles
2 articles
Page 1 of 1
Architect
Architect
Feb 18, 2023 · Artificial Intelligence

Paradigm Shifts in Large Language Models: From Pre‑training to AGI and Future Research Directions

The article reviews the evolution of large language models, highlighting two major paradigm shifts after GPT‑3, the role of scaling laws, knowledge acquisition, prompting techniques, reasoning abilities, and outlines future research priorities for building more capable and efficient AI systems.

AI reasoningIn-Context LearningModel Scaling
0 likes · 71 min read
Paradigm Shifts in Large Language Models: From Pre‑training to AGI and Future Research Directions
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 10, 2023 · Artificial Intelligence

How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights

This article examines the use of Mixture‑of‑Experts (MoE) sparse training for GPT models, detailing the architecture, training and inference efficiency gains, experimental comparisons with dense models, custom routing algorithms, and step‑by‑step deployment on Alibaba Cloud AI platforms.

AI efficiencyGPT-MoELarge Language Models
0 likes · 26 min read
How GPT‑MoE Cuts Training Costs: Sparse Transformer Techniques and Performance Insights