Tag

Token-level

1 views collected around this technical thread.

DataFunSummit
DataFunSummit
Apr 25, 2022 · Artificial Intelligence

Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)

The article introduces a token‑level pipeline parallelism strategy that splits the sequence‑length dimension of Transformer‑based language models, explains why this approach is feasible, presents a dynamic‑programming formulation for optimal slicing, discusses engineering challenges, and evaluates its performance on large GPT models.

Dynamic ProgrammingToken-levellarge language models
0 likes · 13 min read
Token‑Level Pipeline Parallelism for Transformer‑based Language Models (TeraPipe)