Artificial Intelligence 9 min read

SenseNova-U1-8B-MoT-Infographic: Academic Charts, Posters, Recipes

The SenseNova-U1-8B-MoT-Infographic model dramatically improves AI‑generated infographics by enhancing dense‑text rendering, layout stability, and chart accuracy through targeted data, extended mid‑training, and reinforcement‑learning fine‑tuning, achieving top scores on BizGenEval and IGenBench and surpassing many commercial rivals.

SuanNi

May 29, 2026

SenseNova-U1-8B-MoT-Infographic: Academic Charts, Posters, Recipes

Infographic Generation Challenges

Infographics require simultaneous accuracy of text, aesthetic layout, and correct chart data. Common failure modes include blurry dense text, mis‑aligned modules, incorrect list numbering, unreadable footnotes, and chart elements (bars, axis ticks, labels, legends) that do not match the underlying numbers.

Text Accuracy Improvements

Dense small‑font rendering (paragraphs, table annotations, footnotes, list numbers) was previously blurry or mis‑numbered. Specialized data and reinforcement‑learning (RL) fine‑tuning produce clear small‑font text, exact list numbering, and legible footnotes and table annotations.

Precise Chart Generation

Generating charts that align bar heights, axis ticks, labels, and legends with data requires the model to understand data semantics. Reinforcement training on chart‑related data enables exact matching of visual elements to numerical values.

Layout Stability and Visual Appeal

Multi‑module infographics often suffer from cramped modules, misalignment, noisy backgrounds, and unstable element relationships. Dedicated layout data and an extended Mid‑Training (MT) phase teach the model grid structures, whitespace, and hierarchy, resulting in cleaner backgrounds and coherent composition.

Three‑Step Training Pipeline

The enhancement builds on the base U1‑8B‑MoT model with three stages:

MT (Mid‑Training) : Use high‑quality infographic data and longer training to master structure and dense‑text patterns.

SFT (Supervised Fine‑Tuning) : Adjust the data mix between understanding tasks and generation tasks to avoid bias toward either side.

RL (Reinforcement Learning) : Refine the reward function to penalize common faults such as black backgrounds, ensuring both understanding and generation coexist.

Benchmark Results

Evaluation on BizGenEval shows:

Hard set: 39.8 → 46.6 (+6.8)

Easy set: 61.1 → 65.4 (+4.3)

On IGenBench:

Q‑ACC: 51.3 → 69.5 (+18.2)

I‑ACC: 4.2 → 17.0 (+12.8)

OneIG visual‑understanding scores remain stable (~55), indicating no degradation in general perception.

Compared with other open‑source models (Z‑Image, Qwen‑Image, Bagel) which score below 10 on BizGenEval Hard, the enhanced model achieves 46.6. It also surpasses commercial models GPT‑Image‑1.5 (55.0) and Qwen‑Image‑2.0 (50.0) on IGenBench Q‑ACC.

Broad Applicability

The model reliably generates over 100 styles and layouts, covering posters, product introductions, recipe cards, game cards, encyclopedia tutorials, comics, tarot draws, and more.

References

ModelScope summary: https://modelscope.cn/models/SenseNova/SenseNova-U1-8B-MoT-Infographic/summary

HuggingFace repository: https://huggingface.co/sensenova/SenseNova-U1-8B-MoT-Infographic

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

benchmark AI model Multimodal reinforcement learning SenseNova infographic

Written by

SuanNi

A community for AI developers that aggregates large-model development services, models, and compute power.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.