Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation
Hyper‑SD introduces a trajectory‑segmented consistency distillation framework that combines trajectory‑preserving and trajectory‑reconstruction strategies, integrates human‑feedback learning and score distillation, and achieves state‑of‑the‑art low‑step image generation performance on both SD1.5 and SDXL models.
Introduction
Recent diffusion models have achieved impressive results in image and video generation, but their multi‑step denoising process incurs high computational cost. Existing acceleration methods fall into trajectory‑preserving distillation and trajectory‑reconstruction distillation, each limited by performance ceilings or output‑domain shifts.
To overcome these issues, ByteDance’s research team proposes Hyper‑SD, a trajectory‑segmented consistency model that blends the advantages of both strategies and has been endorsed by HuggingFace’s CEO.
Method
1. Trajectory‑Segmented Consistency Distillation
The approach divides the full time interval [0, T] into k segments and performs consistency distillation on each segment, gradually reducing k (8 → 4 → 2 → 1) to achieve full‑time consistency. Training loss combines adversarial and MSE components, with dynamic weighting and noise perturbation for stability.
2. Human‑Feedback Learning
Human aesthetic preferences and visual perception models (e.g., LAION aesthetic predictor, ImageReward, and instance‑segmentation models such as SOLO) are used as reward signals to guide the accelerated model toward more visually pleasing and structurally coherent outputs.
3. One‑Step Generation Enhancement
Score distillation (via Distribution‑Matching Distillation) is applied to improve one‑step generation, merging teacher‑model and student‑model distributions and combining MSE with score‑based losses, while also incorporating the previously described human‑feedback signals.
Experiments
Quantitative comparisons on SD1.5 and SDXL show Hyper‑SD significantly outperforms current state‑of‑the‑art acceleration algorithms across various step counts (1‑8). Visualizations confirm superior low‑step inference quality, and extensive user studies corroborate its advantage.
Hyper‑SD’s LoRA adapters are compatible with diverse style backbones and can be combined with ControlNet for controllable low‑step generation.
Conclusion
The paper presents Hyper‑SD, a unified diffusion‑model acceleration framework that delivers SOTA low‑step generation for both SD1.5 and SDXL by leveraging trajectory‑segmented consistency distillation, human‑feedback learning, and score distillation, and releases open‑source code, LoRA plugins, and a one‑step SDXL model to foster community progress.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.