Artificial Intelligence 17 min read

Key Transformer Model Papers Across Language, Vision, Speech, and Time‑Series Domains

This article surveys the most influential Transformer‑based research papers—from the original Attention Is All You Need work to recent models such as Autoformer and FEDformer—covering breakthroughs in natural language processing, computer vision, speech recognition, and long‑term series forecasting, and provides download links for each.

DataFunSummit
DataFunSummit
DataFunSummit
Key Transformer Model Papers Across Language, Vision, Speech, and Time‑Series Domains

At the end of 2022, AIGC (Artificial Intelligence‑Generated Content) captured widespread attention, with ChatGPT dominating language tasks and diffusion models driving AI‑generated art, prompting both excitement and concern across media and the public.

Beyond the hype, professionals in AI need to understand the underlying principles of these models. Starting from the Transformer architecture introduced by Vaswani et al. (2017), a series of landmark papers have shaped the field.

1. Transformer Architecture Vaswani, Ashish, et al. "Attention Is All You Need." NeurIPS 2017. Download PDF

2. BERT Jacob Devlin, et al. "BERT: Pre‑training of Deep Bidirectional Transformers for Language Understanding." NAACL 2019. Download PDF

3. GPT‑3 Tom Brown, et al. "Language Models are Few‑Shot Learners." NeurIPS 2020. Download PDF

4. Vision Transformer (ViT) Alexey Dosovitskiy, et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale." ICLR 2021. Download PDF

5. DETR Nicolas Carion, et al. "End‑to‑End Object Detection with Transformers." ECCV 2020. Download PDF

6. DALL‑E & DALL‑E 2 Aditya Ramesh, et al. "Zero‑Shot Text‑to‑Image Generation." 2021. Download PDF Follow‑up: "Hierarchical Text‑Conditional Image Generation with CLIP Latents" (2022). Download PDF

7. Stable Diffusion Robin Rombach, et al. "High‑Resolution Image Synthesis with Latent Diffusion Models." 2021. Download PDF

8. DeiT Hugo Touvron, et al. "Training data‑efficient image transformers & distillation through attention." ICML 2021. Download PDF

9. Speech‑Transformer L. Dong, S. Xu, B. Xu. "Speech‑Transformer: A No‑Recurrence Sequence‑to‑Sequence Model for Speech Recognition." ICASSP 2018. Download PDF

10. Transformer TTS Naihan Li, et al. "Neural Speech Synthesis with Transformer Network." 2018. Download PDF

11. FastSpeech & FastSpeech 2 Yi Ren, et al. "FastSpeech: Fast, Robust and Controllable Text to Speech." 2019. Download PDF FastSpeech 2 (2020). Download PDF

12. Whisper Alec Radford, et al. "Robust Speech Recognition via Large‑Scale Weak Supervision." 2022. Download PDF

13. Autoformer Haixu Wu, et al. "Autoformer: Decomposition Transformers with Auto‑Correlation for Long‑Term Series Forecasting." 2021. Download PDF

14. FEDformer Tian Zhou, et al. "FEDformer: Frequency Enhanced Decomposed Transformer for Long‑term Series Forecasting." 2022. Download PDF

15. Graph Transformer Networks Seongjun Yun, et al. "Graph Transformer Networks." NeurIPS 2019. Download PDF

These papers collectively illustrate the evolution of Transformer‑based models across multiple AI sub‑fields, providing essential reading for researchers and engineers seeking to apply or innovate upon these architectures.

AITransformertime series forecastingSpeech Recognitionlanguage modelsresearch surveyVision Models
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.