SenseNova-U1-8B-MoT-Infographic: Academic Charts, Posters, Recipes
The SenseNova-U1-8B-MoT-Infographic model dramatically improves AI‑generated infographics by enhancing dense‑text rendering, layout stability, and chart accuracy through targeted data, extended mid‑training, and reinforcement‑learning fine‑tuning, achieving top scores on BizGenEval and IGenBench and surpassing many commercial rivals.
Infographic Generation Challenges
Infographics require simultaneous accuracy of text, aesthetic layout, and correct chart data. Common failure modes include blurry dense text, mis‑aligned modules, incorrect list numbering, unreadable footnotes, and chart elements (bars, axis ticks, labels, legends) that do not match the underlying numbers.
Text Accuracy Improvements
Dense small‑font rendering (paragraphs, table annotations, footnotes, list numbers) was previously blurry or mis‑numbered. Specialized data and reinforcement‑learning (RL) fine‑tuning produce clear small‑font text, exact list numbering, and legible footnotes and table annotations.
Precise Chart Generation
Generating charts that align bar heights, axis ticks, labels, and legends with data requires the model to understand data semantics. Reinforcement training on chart‑related data enables exact matching of visual elements to numerical values.
Layout Stability and Visual Appeal
Multi‑module infographics often suffer from cramped modules, misalignment, noisy backgrounds, and unstable element relationships. Dedicated layout data and an extended Mid‑Training (MT) phase teach the model grid structures, whitespace, and hierarchy, resulting in cleaner backgrounds and coherent composition.
Three‑Step Training Pipeline
The enhancement builds on the base U1‑8B‑MoT model with three stages:
MT (Mid‑Training) : Use high‑quality infographic data and longer training to master structure and dense‑text patterns.
SFT (Supervised Fine‑Tuning) : Adjust the data mix between understanding tasks and generation tasks to avoid bias toward either side.
RL (Reinforcement Learning) : Refine the reward function to penalize common faults such as black backgrounds, ensuring both understanding and generation coexist.
Benchmark Results
Evaluation on BizGenEval shows:
Hard set: 39.8 → 46.6 (+6.8)
Easy set: 61.1 → 65.4 (+4.3)
On IGenBench:
Q‑ACC: 51.3 → 69.5 (+18.2)
I‑ACC: 4.2 → 17.0 (+12.8)
OneIG visual‑understanding scores remain stable (~55), indicating no degradation in general perception.
Compared with other open‑source models (Z‑Image, Qwen‑Image, Bagel) which score below 10 on BizGenEval Hard, the enhanced model achieves 46.6. It also surpasses commercial models GPT‑Image‑1.5 (55.0) and Qwen‑Image‑2.0 (50.0) on IGenBench Q‑ACC.
Broad Applicability
The model reliably generates over 100 styles and layouts, covering posters, product introductions, recipe cards, game cards, encyclopedia tutorials, comics, tarot draws, and more.
References
ModelScope summary: https://modelscope.cn/models/SenseNova/SenseNova-U1-8B-MoT-Infographic/summary
HuggingFace repository: https://huggingface.co/sensenova/SenseNova-U1-8B-MoT-Infographic
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
SuanNi
A community for AI developers that aggregates large-model development services, models, and compute power.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
