Artificial Intelligence 17 min read

Fin2.0 AI‑Powered Design Assistant: Text‑to‑Image Generation, Prompt Engineering, and Practical Case Study

Fin2.0, NetEase Cloud Music’s AI‑driven design assistant, combines text‑to‑image, text‑to‑icon and text‑to‑copy generation with an internal Stable Diffusion engine and streamlined prompt templates, enabling non‑designers like a colleague to create high‑quality promotional banners in hours while avoiding external costs and data‑security risks.

NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Fin2.0 AI‑Powered Design Assistant: Text‑to‑Image Generation, Prompt Engineering, and Practical Case Study

Fin2.0 is an AI‑driven design assistant developed by NetEase Cloud Music's Public Technology Department. Its vision is to empower the design process with AIGC, lower design thresholds and costs, and simplify business innovation.

Background

The article presents a real‑world case where a business colleague, Jersey, needed promotional banner images for a new song but could not secure a designer. Leveraging Fin2.0’s text‑to‑image ("文生图") feature, Jersey generated two high‑quality banner images within half a day, achieving a #2 ranking on the daily rise chart.

AIGC Capability Matrix

Fin2.0 integrates three AIGC capabilities—text‑to‑image, text‑to‑icon, and text‑to‑copy—to reconstruct the entire design workflow, improving efficiency, reducing communication cost, and avoiding data‑security risks associated with external services.

Challenges with Existing Tools

Internal Dreammaker (Stable Diffusion) requires complex configuration (model, LoRA, prompts, negative prompts, ControlNet, sampler, VAE, etc.).

Midjourney incurs external costs and requires multiple accounts for team usage.

External tools raise data‑security concerns for confidential projects.

Fin2.0 partnered with Dreammaker, keeping all generated data inside the company and benefiting from Dreammaker’s abundant compute resources.

Three‑Step Image Generation Process

Using Stable Diffusion, a single text‑to‑image operation involves more than 30 configuration parameters, grouped into three categories:

1. Mandatory Parameters

Parameter Name

Description

model_name

Base model name

prompt

Positive prompt

2. Basic Parameters

Parameter Name

Description

negative_prompt

Negative prompt

sampler_name

Sampling method

steps

Number of sampling steps

width

Image width

height

Image height

cfg_scale

Prompt relevance scale

n_iter

Iteration count (number of images)

seed

Random seed

3. Auxiliary Parameters

Parameter Name

Description

enable_hr

Enable high‑resolution generation

hr_scale

High‑resolution upscale factor

denoising_strength

Re‑draw strength

hr_upscaler

High‑resolution upscaler algorithm

hr_resize_x

Target width after resize

hr_resize_y

Target height after resize

Additional modules such as LoRA (for style or subject‑specific fine‑tuning) and ControlNet (for special scene control) are also supported, with their own parameter tables provided in the source.

Prompt Template

The recommended prompt formula is:

Subject + Subject Modifiers + Camera & Lighting + Style Settings

Four components:

Subject: Main visual element (e.g., teenager, vinyl record, lake).

Subject Modifiers: Attributes like facial features, expressions, clothing, actions, environment.

Camera & Lighting: Angle, perspective, lighting conditions, image quality descriptors.

Style Settings: Artistic style (e.g., Ghibli, Pixar), image type (photo, illustration, Chinese‑style).

Example prompt: far desert, nearby poplar forest, large lake, Gobi, sheep, yurt, rich details, close‑up, landscape, children’s watercolor

Advanced Settings, History, and Asset Library

For power users, Fin2.0 offers advanced controls such as resolution selection (512×512 for most models, 1024×1024 for SDXL), iteration count, prompt strength, and seed. Generated history and an internal asset gallery allow users to bookmark and reuse high‑quality outputs.

Practical Experience

Common pitfalls include using mismatched models (e.g., a landscape‑oriented model for portrait generation) or inappropriate image sizes, leading to artifacts. The recommended workflow is to preview model capabilities, select the appropriate model, keep image size consistent with training data, and optionally provide reference images for image‑to‑image generation.

Understanding the Diffusion Model

Stable Diffusion converts textual prompts into latent image representations via a text encoder and a noise predictor. Repeated prediction‑and‑denoise steps gradually transform pure noise into an image that aligns with the semantic vector derived from the prompt.

Understanding Prompts

Prompts are tokenized strings that guide the diffusion process. While older models relied heavily on tag‑based prompts, newer SDXL models accept natural language descriptions, reducing the need for rigid token structures.

Creating Complex Images

When a single prompt cannot achieve the desired composition, a two‑stage approach is suggested: first generate partial elements (e.g., character heads) using text‑to‑image, then assemble them in a design tool (MasterGo or Figma) and finally refine the composite with a second text‑to‑image pass.

Conclusion

Fin2.0’s text‑to‑image feature has been applied to various business scenarios such as promotional banners, H5 hero images, and live‑stream assets. Continuous user feedback drives iterative improvements, aiming to make AI‑assisted design more accessible and efficient.

References

https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img

https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-matrix

https://stablediffusionxl.com/

https://github.com/CompVis/latent-diffusion

https://zhuanlan.zhihu.com/p/628714183

https://www.uisdc.com/lora-model

Prompt EngineeringStable DiffusionAIGCAI-generated imagesdesign automationFin2.0
NetEase Cloud Music Tech Team
Written by

NetEase Cloud Music Tech Team

Official account of NetEase Cloud Music Tech Team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.