Rare Earth Juejin Tech Community
Dec 8, 2023 · Artificial Intelligence
Simplifying Transformer Blocks: Removing Residual Connections, LayerNorm, and Other Components without Losing Performance
A recent ETH Zurich paper shows that standard Transformer blocks can be drastically simplified by removing residual connections, LayerNorm, projection and value parameters, and even MLP sub‑block components, achieving up to 16% fewer parameters and comparable training speed and downstream performance on both GPT‑style decoders and BERT models.
AILLMdeep learning
0 likes · 11 min read