Artificial Intelligence 9 min read

One-Click Ad Video from Assets + Brief, plus Baidu’s 8B Text-to-Image – An AI Toolbox

The article introduces three open‑source AI tools—a video editor that turns raw footage and a brief into a finished ad, Baidu's 8‑billion‑parameter text‑to‑image model that runs on 24 GB GPUs, and a weekly AI‑developer digest that auto‑generates Chinese reports—detailing their workflows, benchmarks, usage commands, and target users.

Geek Labs

Apr 24, 2026

One-Click Ad Video from Assets + Brief, plus Baidu’s 8B Text-to-Image – An AI Toolbox

01 | agentic-video-editor

Manual editing of raw footage into a 30‑second advertisement typically requires half a day of selecting shots, arranging rhythm, exporting, and reviewing.

The tool replaces this workflow with an AI Agent pipeline consisting of:

Original footage + creative brief
    ↓
[Pre‑processing] – scene detection, speech‑to‑text, shot indexing
    ↓
[Director Agent] – AI searches footage, selects shots, creates an edit plan
    ↓
[Refinement Agent] – fine‑tunes start/end of each shot
    ↓
[Edit Agent] – FFmpeg renders MP4
    ↓
[Review Agent] – scores relevance, rhythm, visual quality, viewing experience, overall (0‑1 each)
    ↓
If overall score < threshold → feedback to Director Agent (max 3 retries)

Running the editor requires a single command:

ave edit \
  --footage-dir /path/to/your/footage \
  --brief '{"product": "My Product", "audience": "Women 25-45", "tone": "authentic", "duration_seconds": 30}' \
  --pipeline pipelines/ugc-ad.yaml \
  --style styles/dtc-testimonial.yaml

The built‑in DTC template follows the hook → problem → solution → social proof → CTA structure; custom YAML pipelines can be authored to combine agents differently.

02 | ERNIE‑Image

ERNIE‑Image is Baidu’s open‑source diffusion‑transformer (DiT) model with 8 B parameters, achieving state‑of‑the‑art results among open‑weight text‑to‑image models.

GenEval benchmark scores:

Overall 0.8856 (higher than Qwen‑Image 0.8683 and FLUX.2‑klein‑9B 0.8481)

LongTextBench (Chinese long‑text) 0.9733, comparable to Seedream 4.5 0.9882

Key strengths identified in the source:

Text rendering – long paragraphs, dense typography, layout‑rich images (posters, infographics, UI mockups)

Complex instruction compliance – accurate handling of multi‑object, relational, knowledge‑intensive prompts

Structured generation – posters, comics, storyboards, multi‑panel graphics

Consumer‑grade deployment – runs on a single GPU with 24 GB VRAM

Two released variants:

ERNIE‑Image (SFT version) – 50 inference steps, guidance scale 4.0

ERNIE‑Image‑Turbo (DMD+RL accelerated) – 8 inference steps, guidance scale 1.0

Example usage via HuggingFace:

import torch
from diffusers import ErnieImagePipeline

pipe = ErnieImagePipeline.from_pretrained(
    "baidu/ERNIE-Image",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="a black‑and‑white Chinese countryside dog",
    height=1024, width=1024,
    num_inference_steps=50,
    guidance_scale=4.0,
    use_pe=True,
).images[0]

03 | ai-influence-digest

The tool monitors public activity of more than 65 AI developers, filters posts that are immediately useful for content creators, and generates a structured Chinese weekly briefing without relying on the X (Twitter) API.

Core features:

No X API dependency – fully compliant and avoids account bans

Coverage of tools, workflows, tutorials, prompts across 65+ developers

Automatic rendering of Xiaohongshu‑style long‑image screenshots for easy sharing

Markdown‑formatted Chinese summary output

Three‑step workflow:

# Step 1: Scan candidate posts
python3 scripts/scan_x_weekly.py \
  --accounts references/accounts_65.txt \
  --days 7 \
  --outdir ./output/ai-influence-digest

# Step 2: Human review and assemble Markdown weekly report
# (filter criteria in references/filters.md)

# Step 3: Render Xiaohongshu‑style report screenshot
bash scripts/render_weekly_screenshots.sh \
  ./output/ai-influence-digest/weekly_report.md \
  ./output/ai-influence-digest/weekly_report.png \
  "2026-04-18"

Summary

agentic-video-editor – automates raw footage editing into ads via an AI Agent pipeline with automatic review and up to three retry cycles.

ERNIE‑Image – 8 B diffusion‑transformer delivering state‑of‑the‑art text‑to‑image generation on a single 24 GB GPU; excels at Chinese text rendering and structured graphics.

ai-influence-digest – continuously tracks 65+ AI developers, filters high‑value updates, and produces a ready‑to‑share Chinese weekly briefing.

All projects are open source. Repository URLs: https://github.com/poseljacob/agentic-video-editor, https://github.com/baidu/ERNIE-Image, https://github.com/koffuxu/ai-influence-digest.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

text-to-image Open Source AI content creation agentic workflow AI video editing

Written by

Geek Labs

Daily shares of interesting GitHub open-source projects. AI tools, automation gems, technical tutorials, open-source inspiration.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.