Overview of AI-Generated Art: GAN, Diffusion Models, and Stable Diffusion Applications
The article surveys AI‑generated art, explaining how GANs’ limitations gave way to diffusion models and the open‑source Stable Diffusion platform, which offers text‑to‑image, img2img, inpainting, DreamBooth fine‑tuning, and widespread commercial and DIY deployments via cloud or local WebUI setups.
2022 can be regarded as the AIGC (AI‑generated content) year; searches for AI painting and AI‑generated art surged dramatically.
The explosion of AI painting was largely driven by the open‑source release of Stable Diffusion, which built on the rapid progress of diffusion models and the mature text language model GPT‑3, making text‑to‑image generation much easier.
GAN (Generative Adversarial Network) bottlenecks – Since its birth in 2014 and the release of StyleGAN in 2018, GANs have made great strides in image synthesis. A GAN consists of a generator and a discriminator that compete with each other. However, GANs still suffer from limited diversity, mode collapse, and high training difficulty, which hampers practical AI‑generated art products.
Diffusion Model breakthroughs – After years of GAN bottlenecks, researchers introduced diffusion models: a Markov chain adds noise to an image until it becomes random, and a neural network learns to reverse this process, enabling generation from pure noise. By treating text as additional noise, diffusion models can generate images directly from textual descriptions.
Diffusion models simplify training (only large image datasets are needed), produce high‑quality results, and offer great diversity, which explains the remarkable imagination of the new generation of AI.
Although NVIDIA’s upgraded StyleGAN‑T can generate a 512×512 image in 0.1 s (vs. 3 s for Stable Diffusion on the same hardware), StyleGAN‑T excels at low‑resolution images, while diffusion models dominate high‑resolution generation. This article focuses on Stable Diffusion.
Stable Diffusion – After the open‑source release, Stable Diffusion became the strongest AI‑painting model, sparking community enthusiasm. The Auto1111 WebUI made deployment on cloud or local machines extremely simple, and plugins such as DreamBooth and Deforum enable fine‑tuning and animation generation.
AI painting capabilities – The following functions are supported by Stable Diffusion:
text2img: generate images from textual prompts (e.g., “a beautiful girl with a flowered shirt, by Greg Rutkowski”).
img2img: transform an input image guided by a text prompt.
inpainting: modify only masked regions of an image.
DreamBooth: fine‑tune the model on a small personal dataset.
NovelAI models for anime‑style generation (with licensing considerations).
User‑photo models: train a subject‑specific model from a few personal photos.
Style‑specific fine‑tuned models for consistent artistic output.
Current applications – Services such as Meitu, Douyin, 6pen, and Yijian provide AI‑painting APIs; commercial platforms like Midjourney and DALL·E 2 offer premium image generation; tools like Lensa and Manga‑Mirror allow personal model training; the Hugging Face community hosts many fine‑tuned Stable Diffusion checkpoints.
Setting up your own Stable Diffusion WebUI
6.1 Cloud version – Use AutoDL or similar cloud providers (Google Colab, Baidu Paddle) to launch a GPU instance, pull a pre‑built Docker/conda image, and start the service.
cd stable-diffusion-webui/
rm -rf outputs && ln -s /root/autodl-tmp outputs
python launch.py --disable-safe-unpickle --port=6006 --deepdanbooru6.2 Local version (Windows)
Install Python 3.10.6 and add it to PATH.
Install Git.
Clone the Stable Diffusion WebUI repository.
Place model files in models/Stable-Diffusion (download from Hugging Face).
Run webui-user.bat and access the UI at http://localhost:7860 .
Conclusion – This article introduced the background, technical foundations, and practical deployment of AI painting with Stable Diffusion. As AIGC continues to rise, mastering AI‑assisted creation will become an essential skill for work and daily life.
References
From the origin to controversy: AI in the art era – https://sspai.com/post/76277
Neural network learning notes 6 – GAN and Diffusion basics – https://blog.csdn.net/qq_45848817/article/details/127808815
How diffusion models work: the math from scratch – https://theaisummer.com/diffusion-models/
GAN structure overview – https://developers.google.com/machine-learning/gan/gan_structure
Beginner’s guide to Midjourney – https://www.entrogames.com/2022/08/absolute-beginners-guide-to-midjourney‑magical-introduction-to-ai-art/
The viral AI avatar app Lensa undressed me—without my consent – https://www.technologyreview.com/2022/12/12/1064751/the-viral-ai-avatar-app‑lensa‑undressed‑me‑without‑consent/
instruct‑pix2pix – https://huggingface.co/timbrooks/instruct-pix2pix
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.