Ximalaya Technology Team
Oct 10, 2023 · Artificial Intelligence
MiniGPT-5: A Novel Multimodal Generation Model for Coherent Text-Image Synthesis
MiniGPT-5 is a novel multimodal generation model using generative vokens to interleave text and image synthesis, integrating Stable Diffusion and LLMs with a two-stage training that requires no domain-specific annotations, achieving state‑of‑the‑art coherence and quality on benchmarks like CC3M, VIST, and MMDialog.
AI researchMultimodal GenerationStable Diffusion
0 likes · 9 min read