Tagged articles
1 articles
Page 1 of 1
Data Party THU
Data Party THU
Jul 31, 2025 · Artificial Intelligence

How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer

The LaVin-DiT paper introduces a large‑scale vision diffusion transformer that combines a spatiotemporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient generation across diverse visual tasks such as segmentation and video prediction.

3D RoPEGenerative AIVision Transformer
0 likes · 11 min read
How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer