AIGC‑Assisted Marketing Material Generation at Shujia Technology
This article describes Shujia Technology's use of artificial intelligence to generate marketing images and videos, outlining the background, challenges of high-volume content production, detailed solutions for image and video assets—including layout models, diffusion models, and digital human synthesis—and future research directions.
Introduction – Shujia Technology, an internet fintech company, relies heavily on social media channels such as WeChat Moments, Douyin, and public account placements for advertising. The article shares how the company applies AIGC (Artificial Intelligence Generated Content) to assist in marketing material generation.
Background – The company needs to produce massive amounts of creative assets: over 5,000 video clips and 7,000 images per month across multiple platforms, requiring high freshness, low repetition, and strong engagement.
Challenges – The main difficulties are scaling content production while maintaining quality, meeting platform requirements for novelty, and handling diverse formats (single images, collages, videos of various types).
Solution Overview – The approach is split into image‑side and video‑side pipelines, each leveraging large‑model generation, layout optimization, and post‑processing.
Image Material – Prompt engineering feeds a diffusion model (SD) to create base images, which are then processed by a U2‑Net matting model to extract icons. LayoutDM generates layout configurations based on the text content, and the system assembles icons, logos, and backgrounds into final templates. About 90% of image assets are now AIGC‑generated.
Video Material – Video assets are categorized into animation, real‑person street interviews, and scenario dramas.
• Animation – Each video is divided into four segments (pre‑roll, middle, post‑roll, tail). The pre‑roll is AIGC‑driven: prompts are optimized, a diffusion model creates images, masks are applied for platform‑specific backgrounds, and motion effects are added.
• Real‑person street interviews – Currently handled by mixed‑cut techniques with scene‑recognition and semantic similarity models to select and splice clips; limited AIGC usage.
• Scenario drama – Uses style‑transfer (StyleGAN) on existing footage to apply new visual styles; still experimental.
Digital Human (Voice‑over) – Text scripts are generated by strategy teams, converted to speech via TTS, and visualized using the SyncTalk NeRF‑based digital‑human model. The model is fine‑tuned (≈30 minutes of video per avatar) to improve lip‑sync and visual quality.
Results – AIGC now contributes over 40% of video material and more than 90% of image material. Conversion efficiency of AI‑generated assets matches that of manually created assets, justifying continued investment.
Future Outlook – Planned research includes end‑to‑end text‑to‑video pipelines, better scenario‑drama generation via script synthesis and digital‑human integration, and rapid hot‑topic detection for timely content creation.
Q&A – The article concludes with a Q&A covering layout evaluation, the role of AI in copywriting, model customization for digital humans, and lip‑sync optimization techniques.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.