Tagged articles
3 articles
Page 1 of 1
Data Party THU
Data Party THU
Jan 19, 2026 · Artificial Intelligence

How VersatileFFN Cuts Memory Use While Boosting LLM Performance

The article introduces Huawei's VersatileFFN, an adaptive wide‑and‑deep feed‑forward design for large language models that reuses parameters to slash memory consumption while delivering stronger inference, detailing its dual‑system inspiration, technical mechanisms, experimental gains, and implications for efficient LLM deployment.

Adaptive ComputationLLMTransformer
0 likes · 8 min read
How VersatileFFN Cuts Memory Use While Boosting LLM Performance
Code DAO
Code DAO
Dec 5, 2021 · Artificial Intelligence

Understanding DeepMind’s PonderNet: A Thinkable Network for MNIST

This article explains DeepMind’s PonderNet framework, which lets any neural network allocate computation adaptively, demonstrates its implementation with PyTorch Lightning on the MNIST dataset, details the underlying theory, loss functions, training procedure, and evaluates its pondering behavior on rotated digit experiments.

Adaptive ComputationMNISTPonderNet
0 likes · 27 min read
Understanding DeepMind’s PonderNet: A Thinkable Network for MNIST