SPLAM: Sub‑Path Linear Approximation for Accelerating Diffusion Model Sampling
SPLAM (Sub‑Path Linear Approximation Model) accelerates diffusion‑model image synthesis by linearly approximating short sub‑paths of the probability‑flow ODE, allowing high‑quality generation in as few as four steps, outperforming prior fast‑sampling methods on COCO benchmarks and being deployed in Alibaba Mama’s recommendation system.
This article introduces SPLAM (Sub‑Path Linear Approximation Model), a method designed to speed up diffusion‑model image generation by reducing the number of sampling steps required.
Background: Diffusion models achieve high‑quality image synthesis but suffer from slow inference because they typically need dozens or hundreds of denoising steps. Prior works such as DDIM, DPM‑Solver, and Latent Consistency Models (LCM) have reduced steps to 20‑50 or even 2‑4, yet cumulative error and quality loss remain challenges.
Method: SPLAM builds on the probability‑flow ODE (PF‑ODE) formulation of diffusion. It approximates short sub‑paths of the PF‑ODE with a linear ODE (SL‑ODE), inserting a drift coefficient to mitigate error accumulation. By linearly interpolating between two adjacent ODE points, SPLAM provides a progressive error estimate that yields a smoother denoising mapping. The approach also defines a Sub‑Path Linear Approximation Distillation (SPLAD) that trains a student model to mimic the teacher’s SL‑ODE behavior.
During inference, setting the step count to a small value (e.g., 4) enables near‑one‑step image generation while preserving quality. The method is compatible with pretrained Stable Diffusion checkpoints and can be combined with multi‑step strategies for further improvement.
Experiments: On COCO‑30k and COCO‑5k benchmarks, SPLAM achieves FID scores of 10.06 and 20.77 respectively under a 4‑step regime, outperforming LCM and other acceleration techniques. Qualitative comparisons show better color fidelity, realism, and detail preservation. The technique also generalizes to higher‑resolution models (e.g., SD‑2.1), delivering clearer and richer images.
Applications: SPLAM has been deployed in Alibaba Mama’s “Inspiration Recommendation” feature, generating preview creatives within 3‑5 seconds, substantially reducing user wait time and improving experience.
Conclusion: By introducing a sub‑path linear approximation, SPLAM reduces cumulative denoising error, enabling fast, high‑quality image synthesis with few inference steps. The code, model, and paper are open‑sourced.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.