Tagged articles
1 articles
Page 1 of 1
Machine Heart
Machine Heart
May 25, 2026 · Artificial Intelligence

Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation

Flow‑OPD introduces on‑policy distillation into flow‑matching diffusion models, using a multi‑teacher online rollout framework and manifold‑anchor regularization to resolve the seesaw effect of single and mixed rewards, achieving superior multi‑task performance and surpassing specialist models in image generation.

Flow-OPDManifold Anchor Regularizationdiffusion models
0 likes · 9 min read
Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation