DataFunSummit
Apr 25, 2022 · Artificial Intelligence
Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)
The article introduces a token‑level pipeline parallelism strategy that splits the sequence‑length dimension of Transformer‑based language models, explains why this approach is feasible, presents a dynamic‑programming formulation for optimal slicing, discusses engineering challenges, and evaluates its performance on large GPT models.
Dynamic ProgrammingToken-levellarge language models
0 likes · 13 min read