HiCo: A Hierarchical Controllable Diffusion Model for Layout‑to‑Image Generation
The 360 AI Research Institute introduces HiCo, a hierarchical controllable diffusion model that enables fine‑grained layout control across up to eight image regions, integrates seamlessly with existing Stable Diffusion ecosystems, and demonstrates superior performance on the GRIT‑VAL benchmark for layout‑aware image synthesis.
In early 2024 the 360 AI Research Institute released a new AI drawing model called HiCo (Hierarchical Controllable Diffusion model for layout‑to‑image generation). Compared with conventional text‑to‑image models, HiCo adds powerful layout control, allowing users to specify different contents for multiple regions (up to 8+) and producing more natural, coherent multi‑region results.
Background – Recent research in AI image generation has focused on improving controllability, exploring prompts, ControlNet, Cross‑Attention, and other conditioning methods to achieve control over shape, color, style, and layout.
The existing market models suffer from limited coarse‑grained layout control, poor compatibility with open‑source community resources (different base models, LoRA adapters), and lack of concept injection capabilities.
Method Overview – HiCo builds on diffusion‑based controllable generation works such as ControlNet and IP‑Adapter. It introduces external conditioning via sub‑bounding‑box and sub‑caption inputs, as well as a global caption for background description. A lightweight plug‑in module (HiCo‑Net) is trained on top of a frozen UNet, enabling layout‑aware generation without retraining the entire diffusion backbone. This design allows seamless compatibility with Stable Diffusion 1.5, XL, and various LoRA/LCM extensions.
The architecture (see Fig. 1) adds a HiCo‑Net that generates region‑specific content and a global background, which are fused with the UNet output to produce a coherent image.
HiCo supports multi‑target controllable generation, demonstrated with three‑region and five‑region layouts (Fig. 2‑1, Fig. 2‑2) and with various base models such as Midjourney‑Papercut, RealisticVision, DisneyPixarCartoon, and Flat2DAnimerge (Fig. 3). It also works with LoRA adapters (Blindbox_v3, Shrek) for concept‑level control (Fig. 4).
Integration with LCM/LCM‑LoRA enables fast generation at 4, 6, and 8 diffusion steps, as shown in Fig. 5.
Evaluation – The authors built the GRIT‑VAL dataset (300 k natural‑scene images with detailed position and caption annotations) based on GRIT‑20. HiCo’s layout control was quantitatively assessed against English Stable Diffusion, English SD+HiCo, Chinese BDM, and Chinese BDM+HiCo. Results show HiCo achieves superior layout accuracy, though Chinese BDM+HiCo scores slightly lower due to translation errors.
Additional qualitative results on GRIT‑VAL (Fig. 6) and COCO‑based coarse‑grained prompts (Fig. 7) illustrate HiCo’s ability to generate realistic images from fine‑grained or coarse textual descriptions.
Conclusion and Outlook – HiCo provides a strong layout‑control capability for AI drawing, trained on natural scenes with multi‑scale data. While it already improves the playability of AI art, future work will focus on image editing, multi‑style concept injection, and further performance gains. The model will be available for hands‑on experience on the 360 AI platform (aigc.360.com).
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.