Tagged articles
2 articles
Page 1 of 1
Machine Heart
Machine Heart
Apr 16, 2026 · Artificial Intelligence

Achieving 4.6× Faster Diffusion Model Training with FP4‑BF16 Dual‑Track Parallelism (Sol‑RL)

Sol‑RL, a framework from NVIDIA, Hong Kong University and MIT, integrates NVFP4 inference for large‑scale rollout exploration and BF16 precision for high‑fidelity regeneration, delivering up to 4.64× faster convergence at equivalent reward levels while preserving BF16 training fidelity across SANA, FLUX.1 and SD3.5‑L models.

BF16FP4GPU optimization
0 likes · 9 min read
Achieving 4.6× Faster Diffusion Model Training with FP4‑BF16 Dual‑Track Parallelism (Sol‑RL)
Code DAO
Code DAO
Jan 15, 2022 · Artificial Intelligence

How Intel BF16 with IPEX and oneDNN Boosts PyTorch Performance

This article explains how Intel and Facebook's BF16 support, combined with the Intel Extension for PyTorch (IPEX) and oneDNN, automates type and layout conversions and adds graph‑fusion optimizations, delivering 1.4×‑4.3× inference and up to 2.4× training speedups on Xeon CPUs for models such as DLRM, BERT‑Large, and ResNext‑101‑32x4d.

BF16CPU accelerationIPEX
0 likes · 13 min read
How Intel BF16 with IPEX and oneDNN Boosts PyTorch Performance