Tag

Transformer Engine

1 views collected around this technical thread.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Sep 13, 2023 · Artificial Intelligence

Pai‑Megatron‑Patch: Design Principles, Key Features, and End‑to‑End Usage for Large Language Model Training

This article introduces the open‑source Pai‑Megatron‑Patch tool from Alibaba Cloud, explains its non‑intrusive patch architecture, enumerates supported models and features such as weight conversion, Flash‑Attention 2.0, FP8 training with Transformer Engine, and provides detailed command‑line examples for model conversion, pre‑training, supervised fine‑tuning, inference, and RLHF reinforcement learning pipelines.

Deep LearningFP8LLM
0 likes · 19 min read
Pai‑Megatron‑Patch: Design Principles, Key Features, and End‑to‑End Usage for Large Language Model Training