XPSR: Cross‑modal Priors for Diffusion‑based Image Super‑Resolution
The paper introduces XPSR, a diffusion‑based image super‑resolution method that incorporates cross‑modal semantic priors from a large multimodal language model, achieving state‑of‑the‑art performance on both reference and no‑reference quality metrics across synthetic and real‑world video restoration tasks.
At ECCV 2024, Kuaishou Audio‑Video Technology and Tsinghua University presented XPSR, a diffusion‑based image super‑resolution method that leverages cross‑modal priors generated by a large multimodal language model.
Video and image restoration are increasingly important; previous GAN‑based methods struggle with fine texture and subjective quality, while diffusion models have shown impressive generative capabilities.
The XPSR framework consists of two stages: (1) a multimodal LLM produces semantic descriptions of the low‑resolution image; (2) the low‑resolution image and the semantic information are fed into a diffusion UNet, where a novel Semantic‑Fusion Attention (SFA) merges parallel cross‑attention streams to balance object and quality cues.
Semantic description generation, state‑information fusion, degradation‑free constraints, and an optimization objective are detailed, with equations such as x_{\textit{lr}} , z_{\textit{hr}}^t , c_h , c_l incorporated.
During training, a degradation‑free constraint aligns LR and HR features at multiple scales, and classifier‑free guidance with negative prompts (e.g., “blurry, dotted, noise, unclear, low‑res, over‑smoothed”) improves visual fidelity.
Extensive experiments on synthetic and real‑world datasets show that XPSR outperforms existing GAN‑based and diffusion‑based baselines on both reference metrics (PSNR, SSIM, LPIPS, DISTS, FID) and no‑reference metrics (MANIQA, CLIPIQA, MUSIQ), as illustrated in Tables 1 and 2 and the accompanying visual comparisons.
The authors conclude that XPSR achieves state‑of‑the‑art performance and will continue to support Kuaishou’s video enhancement pipeline, with future work aimed at broader applications.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.