Tagged articles

68 articles

Page 1 of 1

May 12, 2026 · Artificial Intelligence

How DreamLite Enables Real-Time Text-to-Image Generation and Editing on Mobile Devices

DreamLite, a 0.39 B‑parameter diffusion model from ByteDance, unifies text‑to‑image generation and text‑guided editing in a single on‑device network, delivering 1024×1024 results in about three seconds on an iPhone 17 Pro while surpassing existing mobile and even many server‑side baselines.

DreamLiteModel CompressionRLHF

0 likes · 9 min read

How DreamLite Enables Real-Time Text-to-Image Generation and Editing on Mobile Devices

Machine Heart

May 6, 2026 · Artificial Intelligence

ICLR 2026: How LiveMoments Restores Live Photo Cover Frames Without Blur

The paper "LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference‑guided Diffusion" introduces a new task and a diffusion‑based method that uses the original high‑resolution cover frame to dramatically improve the visual quality of reselection cover frames in Live Photos, outperforming existing reference‑super‑resolution and single‑frame approaches.

ICLR 2026Live PhotoMobile Imaging

0 likes · 8 min read

ICLR 2026: How LiveMoments Restores Live Photo Cover Frames Without Blur

Meituan Technology Team

Apr 16, 2026 · Artificial Intelligence

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

LongCat-AudioDiT introduces a wave‑VAE plus diffusion Transformer architecture that eliminates intermediate spectrograms, solves training‑inference mismatch with dual constraints, replaces classifier‑free guidance with adaptive projection guidance, and achieves state‑of‑the‑art zero‑shot voice cloning performance on multiple benchmarks.

AI researchAudio GenerationOpen Source

0 likes · 12 min read

Can End-to-End Diffusion TTS Beat Traditional Pipelines? Inside LongCat-AudioDiT

Old Zhang's AI Learning

Apr 14, 2026 · Artificial Intelligence

Qwen3.5-27B-DFlash Delivers Up to 5× Faster Inference Without Quality Loss

The DFlash approach replaces speculative decoding’s autoregressive drafter with a block diffusion model and injects target‑model hidden features into every KV‑cache layer, achieving up to 5× speed‑up for Qwen3.5‑27B on single‑GPU and 1.5–1.9× on high‑concurrency workloads while preserving output quality.

DFlashInference AccelerationSGLang

0 likes · 12 min read

Qwen3.5-27B-DFlash Delivers Up to 5× Faster Inference Without Quality Loss

AI Explorer

Apr 11, 2026 · Artificial Intelligence

VoxCPM2: Tokenizer‑Free Multilingual TTS that Creates New Voices from Text

VoxCPM2, an open‑source 2‑billion‑parameter TTS model from OpenBMB, eliminates tokenizers and uses a diffusion‑autoregressive architecture to generate high‑fidelity, controllable speech in 30 languages, supporting voice design from natural‑language prompts and high‑quality voice cloning with just a short reference clip.

AudioVAEOpen SourceTTS

0 likes · 8 min read

VoxCPM2: Tokenizer‑Free Multilingual TTS that Creates New Voices from Text

Machine Heart

Apr 3, 2026 · Artificial Intelligence

Capture Character Animation from Any Object Using Just a Phone – CHI 2026 Best Paper Nominee

DancingBox demonstrates that a single RGB camera, a flat calibration board, and any handheld object can be used to capture realistic character animation by first estimating coarse 3D bounding‑box motion with visual foundation models and then refining it with a diffusion‑based motion generation model, validated by a user study.

AIDancingBoxHuman-Computer Interaction

0 likes · 9 min read

Capture Character Animation from Any Object Using Just a Phone – CHI 2026 Best Paper Nominee

Bighead's Algorithm Notes

Apr 2, 2026 · Artificial Intelligence

Diffolio: A Diffusion‑Model Framework for Risk‑Aware Portfolio Optimization

Diffolio introduces a diffusion‑model‑based approach that directly learns a pseudo‑optimal portfolio distribution conditioned on user risk preferences, generating diverse high‑quality portfolios and outperforming classic and recent baselines on six real‑world market datasets, with annualized returns improving up to 12.1 percentage points.

Financial AIGenerative ModelingQuantitative Finance

0 likes · 22 min read

Diffolio: A Diffusion‑Model Framework for Risk‑Aware Portfolio Optimization

Bighead's Algorithm Notes

Mar 14, 2026 · Artificial Intelligence

Quantitative Finance Paper Digest: AI‑Driven Market Prediction Studies (Mar 7‑13 2026)

This digest summarizes four recent research papers that apply advanced AI techniques—node‑transformer graphs with BERT sentiment analysis, a quantum‑classical LSTM‑Born machine hybrid, large‑language‑model benchmarking for portfolio optimization, and a conditional diffusion model—to improve stock market prediction, volatility forecasting, and investment decision making, providing detailed experimental results and statistical validation.

BERTLarge Language ModelQuantum Computing

0 likes · 10 min read

Quantitative Finance Paper Digest: AI‑Driven Market Prediction Studies (Mar 7‑13 2026)

Amap Tech

Feb 11, 2026 · Artificial Intelligence

Can Diffusion Models Turn Noisy GPS into Sub‑Meter Visual Localization?

The DiffVL framework redefines visual localization as a diffusion‑based GPS denoising task, using BEV‑conditioned visual cues and standard SD maps to achieve sub‑meter accuracy without high‑definition maps, and demonstrates its superiority through extensive autonomous‑driving experiments.

Autonomous DrivingBEVGPS denoising

0 likes · 11 min read

Can Diffusion Models Turn Noisy GPS into Sub‑Meter Visual Localization?

HyperAI Super Neural

Feb 9, 2026 · Artificial Intelligence

MIT and Partners Use 23k+ Recipes and Diffusion Models to Create Zeolites with Si/Al = 19

The study introduces DiffSyn, a generative diffusion model trained on 23,961 zeolite synthesis recipes spanning over 50 years, which outperforms regression and other generative baselines, accurately predicts synthesis routes, and experimentally validates a novel UFI zeolite with a record Si/Al ratio of 19.

Chemical GuidanceMachine LearningMaterials Synthesis

0 likes · 17 min read

MIT and Partners Use 23k+ Recipes and Diffusion Models to Create Zeolites with Si/Al = 19

Network Intelligence Research Center (NIRC)

Jan 17, 2026 · Artificial Intelligence

DiffNBR: A Spatiotemporal Diffusion and Information‑Bottleneck Approach for Next‑Basket Recommendation

DiffNBR introduces a dual‑path diffusion framework combined with an information‑bottleneck mechanism to jointly model spatial co‑occurrence and temporal evolution in next‑basket recommendation, achieving state‑of‑the‑art performance and effectively disentangling repetitive and exploratory purchase patterns.

DiffNBRdiffusion modelinformation bottleneck

0 likes · 8 min read

DiffNBR: A Spatiotemporal Diffusion and Information‑Bottleneck Approach for Next‑Basket Recommendation

Kuaishou Tech

Nov 19, 2025 · Artificial Intelligence

Can a Single Number Create a Whole New Visual Style? Inside CoTyle’s Code‑to‑Style Generation

CoTyle introduces a novel open‑source framework that generates unique image styles from a numeric style code, eliminating the need for reference images, lengthy prompts, or LoRA modules, and demonstrates superior style consistency compared to existing solutions like Midjourney.

Transformerdiffusion modelgenerative AI

0 likes · 8 min read

Can a Single Number Create a Whole New Visual Style? Inside CoTyle’s Code‑to‑Style Generation

HyperAI Super Neural

Nov 10, 2025 · Artificial Intelligence

Columbia & Stanford Launch Squidiff: Diffusion Model for Transcriptome Simulation

Squidiff, a conditional diffusion framework co‑developed by Columbia and Stanford, predicts transcriptional responses across cell differentiation, gene and drug perturbations, and radiation exposure, outperforming prior models and enabling more precise and spatially aware biomedical research.

AI for BiologyPerturbation PredictionSingle-Cell Transcriptomics

0 likes · 16 min read

Columbia & Stanford Launch Squidiff: Diffusion Model for Transcriptome Simulation

HyperAI Super Neural

Oct 27, 2025 · Artificial Intelligence

MIT’s Open‑Source BoltzGen Achieves nM‑Level Affinity for 66% of Targets Across Molecular Types

BoltzGen, an all‑atom generative model released by MIT and collaborators, unifies protein folding and binder design with a geometric continuous representation and a flexible design language, training on multimodal datasets and demonstrating nM‑level affinity for 66% of 26 diverse targets including proteins, nanobodies, peptides and small molecules.

BoltzGendiffusion modelgenerative AI

0 likes · 12 min read

MIT’s Open‑Source BoltzGen Achieves nM‑Level Affinity for 66% of Targets Across Molecular Types

Data Party THU

Oct 24, 2025 · Artificial Intelligence

BREEZE: Enhancing Zero‑Shot Reinforcement Learning with Behavioral Regularization

The paper introduces BREEZE, a behavior‑regularized zero‑shot RL framework that improves stability, policy extraction, and representation quality by combining in‑sample learning, task‑conditioned diffusion models, and expressive attention‑based architectures, achieving near‑state‑of‑the‑art performance on benchmarks like ExORL and D4RL Kitchen.

behavioral regularizationdiffusion modeloffline RL

0 likes · 3 min read

BREEZE: Enhancing Zero‑Shot Reinforcement Learning with Behavioral Regularization

Kuaishou Tech

Sep 23, 2025 · Artificial Intelligence

How Generative Reinforcement Learning is Revolutionizing Real-Time Bidding

This article explains the core challenges of real‑time bidding, traces the evolution from PID to MPC to reinforcement learning, and details how generative reinforcement‑learning techniques such as GAVE and CBD combine diffusion models, value‑guided exploration, and score‑based return‑to‑go to dramatically improve ad‑bid efficiency and revenue.

CBDGAVEadvertising algorithms

0 likes · 15 min read

How Generative Reinforcement Learning is Revolutionizing Real-Time Bidding

Bilibili Tech

Sep 19, 2025 · Artificial Intelligence

How TextFlux Enables OCR‑Free Multi‑Language Scene Text Editing with Diffusion Models

TextFlux introduces an OCR‑free diffusion‑based framework that seamlessly inserts multilingual text into real‑world images using only glyph images and minimal training data, offering high visual fidelity, zero‑shot character rendering, and efficient multi‑line and single‑line generation on consumer GPUs.

OCR-freeText Editingdiffusion model

0 likes · 8 min read

How TextFlux Enables OCR‑Free Multi‑Language Scene Text Editing with Diffusion Models

AI Frontier Lectures

Sep 8, 2025 · Artificial Intelligence

How DynamicFace Achieves High‑Quality, Consistent Face Swaps in Images and Video

DynamicFace introduces a novel face‑swapping framework that combines diffusion models with composable 3D facial priors, explicitly decoupling identity, pose, expression, lighting and background, achieving superior identity preservation and motion consistency across images and videos, as demonstrated by extensive qualitative and quantitative comparisons with SOTA methods.

3D facial priorsdiffusion modelface swapping

0 likes · 10 min read

How DynamicFace Achieves High‑Quality, Consistent Face Swaps in Images and Video

Architects Research Society

Sep 4, 2025 · Artificial Intelligence

Choosing the Right Generative AI Model: Transformers, Diffusion, GANs & RNNs Explained

This article outlines the four dominant generative AI architectures—Transformers, diffusion models, GANs, and RNNs—explaining their core mechanisms, key capabilities, and typical application domains such as chatbots, image creation, deep‑fake media, and time‑series analysis, helping readers choose the right model for their needs.

AI applicationsGaNRNN

0 likes · 3 min read

Choosing the Right Generative AI Model: Transformers, Diffusion, GANs & RNNs Explained

JD Cloud Developers

Aug 27, 2025 · Artificial Intelligence

How AI Virtual Try‑On Boosted Fashion Sales by 80%: A Technical Deep‑Dive

This article details how JD.com’s AI‑driven virtual fitting solution, integrated with an A/B testing platform, transformed fashion e‑commerce by generating realistic model images and videos, cutting production costs to zero, accelerating design cycles, and increasing conversion rates by over 80% during major sales events.

A/B testingAIFashion E‑commerce

0 likes · 14 min read

How AI Virtual Try‑On Boosted Fashion Sales by 80%: A Technical Deep‑Dive

Xiaohongshu Tech REDtech

Aug 18, 2025 · Artificial Intelligence

DynamicFace: Composable 3D Facial Priors for High‑Quality, Consistent Face Swaps

DynamicFace introduces a controllable face‑swapping framework that leverages composable 3D facial priors, dual‑stream identity injection, and a FusionTVO module to achieve superior image and video quality, identity preservation, and temporal consistency, outperforming existing state‑of‑the‑art methods on benchmark datasets.

3D facial priorsAIcontrollable generation

0 likes · 13 min read

DynamicFace: Composable 3D Facial Priors for High‑Quality, Consistent Face Swaps

Data Party THU

Aug 17, 2025 · Artificial Intelligence

How BioEmu Generates Protein Conformational Ensembles Faster Than MD

Microsoft Research’s AI for Science team released the open‑source BioEmu model, a generative diffusion architecture that leverages AlphaFold’s Evoformer and extensive MD and stability data to efficiently sample protein conformational ensembles, achieving near‑MD accuracy in free‑energy and mutation stability predictions while dramatically reducing computational cost.

AlphaFoldbioinformaticsdiffusion model

0 likes · 6 min read

How BioEmu Generates Protein Conformational Ensembles Faster Than MD

AIWalker

Aug 13, 2025 · Artificial Intelligence

One‑Model‑For‑All: Inception‑Level AI Try‑On/Off with Arbitrary Poses and No Masks

The paper presents OMFA, a diffusion‑based unified framework for virtual try‑on and try‑off that removes the need for garment templates, segmentation masks, and fixed poses by leveraging a novel partial‑diffusion mechanism and SMPL‑X pose conditioning, achieving state‑of‑the‑art results on VITON‑HD and DeepFashion‑MultiModal datasets.

AI try-onSMPL-Xcomputer vision

0 likes · 15 min read

One‑Model‑For‑All: Inception‑Level AI Try‑On/Off with Arbitrary Poses and No Masks

Data Party THU

Jul 31, 2025 · Artificial Intelligence

How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer

The LaVin-DiT paper introduces a large‑scale vision diffusion transformer that combines a spatiotemporal variational auto‑encoder, a joint diffusion transformer with full‑sequence joint attention, and 3D rotary position encoding to enable unified, efficient generation across diverse visual tasks such as segmentation and video prediction.

3D RoPEVision Transformercomputer vision

0 likes · 11 min read

How LaVin-DiT Revolutionizes Vision Generation with ST‑VAE and Joint Diffusion Transformer

Kuaishou Tech

Jul 9, 2025 · Artificial Intelligence

How ResULIC Achieves Ultra‑Low‑Rate Image Compression with Semantic Residual Coding and Diffusion

The paper introduces ResULIC, a residual‑guided ultra‑low‑bitrate image compression framework that combines semantic residual coding, a compression‑aware diffusion model, and perceptual fidelity optimization to dramatically improve visual quality and outperform prior diffusion‑based methods on standard benchmarks.

Machine LearningResULICdiffusion model

0 likes · 12 min read

How ResULIC Achieves Ultra‑Low‑Rate Image Compression with Semantic Residual Coding and Diffusion

JD Cloud Developers

Jul 4, 2025 · Artificial Intelligence

How AI-Powered Virtual Try-On Boosted Fashion Sales by 80%+

This article details JD Retail's AI virtual try‑on system, its technical challenges, innovations such as a large‑scale diffusion model and adaptive masking, and how the solution dramatically cut costs, accelerated image production, and increased conversion rates for fashion e‑commerce during a major promotion.

AIFashion E‑commercediffusion model

0 likes · 14 min read

How AI-Powered Virtual Try-On Boosted Fashion Sales by 80%+

Huolala Tech

Jul 2, 2025 · Artificial Intelligence

Can Diffusion Models Revolutionize Salient Object Detection?

This article introduces a diffusion‑based framework for salient object detection, discusses its background, challenges, and motivations, details the model architecture and training, presents extensive experiments and ablation studies, and outlines limitations and future research directions.

computer visiondeep learningdiffusion model

0 likes · 11 min read

Can Diffusion Models Revolutionize Salient Object Detection?

AI Frontier Lectures

Mar 29, 2025 · Artificial Intelligence

How MMGDreamer Achieves Precise Geometry Control in 3D Indoor Scene Generation

MMGDreamer introduces a mixed‑modality graph and a dual‑branch diffusion model that combine text, image, and relational cues to generate highly realistic, geometrically controllable 3D indoor scenes, outperforming prior methods on multiple quantitative and qualitative benchmarks.

3D scene generationAI researchcomputer vision

0 likes · 11 min read

How MMGDreamer Achieves Precise Geometry Control in 3D Indoor Scene Generation

AI Frontier Lectures

Mar 25, 2025 · Artificial Intelligence

Can Mixed‑Modality Graphs Unlock Precise 3D Indoor Scene Generation?

MMGDreamer introduces a mixed‑modality graph and a dual‑branch diffusion model that jointly enhance geometric control and realism in 3D indoor scene synthesis, outperforming state‑of‑the‑art methods across multiple quantitative and qualitative benchmarks.

3D scene generationAIcomputer vision

0 likes · 12 min read

Can Mixed‑Modality Graphs Unlock Precise 3D Indoor Scene Generation?

AIWalker

Feb 23, 2025 · Artificial Intelligence

U‑ViT: How a ViT‑Based Diffusion Model Beats DiT and Redefines Image Generation

U‑ViT replaces the convolutional U‑Net backbone of diffusion models with a Vision Transformer, treats time, condition and noisy patches as tokens, adds long skip connections and a lightweight 3×3 convolution, and through extensive ablations and scaling studies achieves state‑of‑the‑art FID scores on unconditional, class‑conditional and text‑to‑image generation tasks.

AdaLNFIDLong Skip Connections

0 likes · 16 min read

U‑ViT: How a ViT‑Based Diffusion Model Beats DiT and Redefines Image Generation

Kuaishou Tech

Feb 20, 2025 · Artificial Intelligence

Second Short-Form Video Quality Assessment and Enhancement Challenge (CVPR NTIRE 2025)

The second short-form video quality assessment and enhancement challenge, co‑organized by Kuaishou's audio‑video team and the Intelligent Media Computing Lab, invites global researchers to develop efficient quality assessment models and diffusion‑based super‑resolution methods using the new KwaiSR dataset, with prize money and potential CVPR workshop paper invitations.

AI competitionCVPR NTIREdataset

0 likes · 9 min read

Second Short-Form Video Quality Assessment and Enhancement Challenge (CVPR NTIRE 2025)

AntTech

Dec 19, 2024 · Artificial Intelligence

Framer: Interactive Video Frame Interpolation Using Diffusion Models

Framer is an interactive video frame interpolation method that leverages large‑pretrained video diffusion models, allowing users to define custom motion trajectories or use an automatic mode, and demonstrates strong performance in image deformation, video generation, and cartoon‑to‑video applications.

AIFramerdiffusion model

0 likes · 4 min read

Framer: Interactive Video Frame Interpolation Using Diffusion Models

DaTaobao Tech

Nov 27, 2024 · Artificial Intelligence

FuseAnyPart: Diffusion‑Driven Facial Parts Swapping via Multiple Reference Images

FuseAnyPart is a diffusion‑model‑based facial part swapping technique that fuses features from multiple reference images via mask‑based fusion and additive injection modules, delivering high‑fidelity, consistent face edits with lower computational cost, outperforming prior methods on CelebA‑HQ and FaceForensics++ and already boosting commercial AIGC applications.

computer visiondiffusion modelfacial part swapping

0 likes · 9 min read

FuseAnyPart: Diffusion‑Driven Facial Parts Swapping via Multiple Reference Images

360 Tech Engineering

Oct 31, 2024 · Artificial Intelligence

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-Image Generation

The paper introduces HiCo, a hierarchical controllable diffusion model that enables precise layout‑to‑image generation by decoupling object and background features through weight‑shared branches and a fusion module, achieving high‑quality results and efficient inference as demonstrated on the HiCo‑7K benchmark.

AI paintingHiCoNeurIPS2024

0 likes · 9 min read

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-Image Generation

Alimama Tech

Oct 17, 2024 · Artificial Intelligence

FLUX ControlNet Inpainting and 8-Step Turbo Acceleration Models

Alibaba’s Mama Intelligent Creation team has open‑sourced a FLUX‑based ControlNet inpainting model that leverages a DiT‑backed Interleave design for superior repair quality, and an 8‑step LoRA‑Turbo model that cuts inference time three‑fold while preserving near‑original image fidelity, both now available on Hugging Face and ModelScope.

AIControlNetFlux

0 likes · 9 min read

FLUX ControlNet Inpainting and 8-Step Turbo Acceleration Models

Model Perspective

Sep 29, 2024 · Fundamentals

What Do Broadcast, Diffusion, and SIR Models Reveal About Article Virality?

This article explores how the broadcast, diffusion, and SIR mathematical models explain the rise, spread, and eventual decline of online articles, offering practical insights for boosting initial reach and sustaining reader interest through strategic sharing and content design.

SIR modelbroadcast modeldiffusion model

0 likes · 7 min read

What Do Broadcast, Diffusion, and SIR Models Reveal About Article Virality?

Kuaishou Tech

Sep 27, 2024 · Artificial Intelligence

XPSR: Cross‑modal Priors for Diffusion‑based Image Super‑Resolution

The paper introduces XPSR, a diffusion‑based image super‑resolution method that incorporates cross‑modal semantic priors from a large multimodal language model, achieving state‑of‑the‑art performance on both reference and no‑reference quality metrics across synthetic and real‑world video restoration tasks.

AI researchECCV2024cross‑modal priors

0 likes · 8 min read

XPSR: Cross‑modal Priors for Diffusion‑based Image Super‑Resolution

Ops Development & AI Practice

Jul 9, 2024 · Artificial Intelligence

How AnimateDiff-Lightning Elevates Open-Source AI Animation

AnimateDiff-Lightning, an open‑source diffusion model released by ByteDance on Hugging Face, delivers high‑resolution image and video generation with versatile integration, showcasing how community‑driven AI tools can accelerate creative and commercial applications.

AIAnimationByteDance

0 likes · 5 min read

How AnimateDiff-Lightning Elevates Open-Source AI Animation

Baidu Tech Salon

May 24, 2024 · Artificial Intelligence

HelixDock: A Large-Scale Pretrained Full-Atom Diffusion Model for Protein–Small Molecule Docking

HelixDock, a full‑atom diffusion model pretrained on a billion‑scale simulated docking dataset covering ~200,000 protein targets, delivers state‑of‑the‑art docking accuracy—85.6% success on PoseBusters and strong generalization on cross‑docking benchmarks—showing that massive data and model scaling dramatically improve AI‑driven drug discovery, and its code and data are fully open‑source.

AI for drug discoveryHelixDockdeep learning

0 likes · 6 min read

HelixDock: A Large-Scale Pretrained Full-Atom Diffusion Model for Protein–Small Molecule Docking

21CTO

Apr 17, 2024 · Artificial Intelligence

How Sora Generates High‑Quality Text‑to‑Video: A Deep Dive into Its Architecture

This article breaks down OpenAI's Sora text‑to‑video model, exploring its overall structure, visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion, long‑time consistency strategies, training techniques, and the technical choices that enable variable resolution, aspect ratios, and up to 60‑second video generation.

AI video generationLatent DiffusionSora

0 likes · 50 min read

How Sora Generates High‑Quality Text‑to‑Video: A Deep Dive into Its Architecture

360 Tech Engineering

Apr 17, 2024 · Artificial Intelligence

HiCo: A Hierarchical Controllable Diffusion Model for Layout‑to‑Image Generation

The 360 AI Research Institute introduces HiCo, a hierarchical controllable diffusion model that enables fine‑grained layout control across up to eight image regions, integrates seamlessly with existing Stable Diffusion ecosystems, and demonstrates superior performance on the GRIT‑VAL benchmark for layout‑aware image synthesis.

AI drawingHiCocontrollable generation

0 likes · 8 min read

HiCo: A Hierarchical Controllable Diffusion Model for Layout‑to‑Image Generation

Architect

Apr 16, 2024 · Artificial Intelligence

Unraveling Sora: How OpenAI Might Build a 60‑Second Video Generator

This article dissects the possible architecture of OpenAI's Sora video model, tracing its visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion backbone, long‑time consistency strategies, and training pipeline, while comparing alternatives such as MAGVIT‑v2, TECO, NaViT, and FDM to reveal why each design choice may have been made.

AI ArchitectureLatent DiffusionSora

0 likes · 51 min read

Unraveling Sora: How OpenAI Might Build a 60‑Second Video Generator

JD Retail Technology

Apr 10, 2024 · Artificial Intelligence

AI-Generated E-commerce Advertising Images: Relationship-Aware Diffusion Models for Layout, Background, and Poster Generation

This article analyzes the challenges of manual e‑commerce ad image creation and presents JD's innovative AI solutions—including a relationship‑aware diffusion model for poster layout, a category‑common and personalized background generator, and an end‑to‑end planning‑and‑rendering framework—that achieve high‑quality automatic ad creative generation and boost advertising revenue.

AIdiffusion modelimage generation

0 likes · 21 min read

AI-Generated E-commerce Advertising Images: Relationship-Aware Diffusion Models for Layout, Background, and Poster Generation

Architect

Mar 28, 2024 · Artificial Intelligence

Understanding OpenAI's Sora Video Generation Model: Architecture, Workflow, and Core Technologies

This article explains OpenAI's Sora video generation model, detailing its latent diffusion foundation, video compression network, spacetime patch representation, Diffusion Transformer processing, and decoding pipeline, while also reviewing related Stable Diffusion and Transformer concepts that enable high‑quality text‑to‑video synthesis.

AILatent DiffusionSora

0 likes · 17 min read

Understanding OpenAI's Sora Video Generation Model: Architecture, Workflow, and Core Technologies

Alipay Experience Technology

Mar 28, 2024 · Artificial Intelligence

How OpenAI’s Sora Revolutionizes Text‑to‑Video Generation: Capabilities & Comparisons

This article introduces OpenAI’s Sora video‑generation model, compares it with other leading solutions, explains its underlying diffusion‑based architecture, showcases sample outputs, outlines its diverse generation abilities, and discusses current limitations and future implications for AI‑driven video creation.

AI video generationOpenAISora

0 likes · 13 min read

How OpenAI’s Sora Revolutionizes Text‑to‑Video Generation: Capabilities & Comparisons

DaTaobao Tech

Mar 27, 2024 · Artificial Intelligence

Building a Simple Diffusion Model with Python

This tutorial walks through implementing a basic Denoising Diffusion Probabilistic Model in Python, explaining the forward noise schedule, reverse denoising training, and providing complete code for noise schedules, diffusion functions, residual and attention blocks, a UNet architecture, loss computation, and a training loop.

DDPMPythonU-Net

0 likes · 26 min read

Building a Simple Diffusion Model with Python

DevOps

Mar 26, 2024 · Artificial Intelligence

OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model

OpenAI’s newly released Sora model demonstrates one‑minute text‑to‑video generation using a diffusion‑based transformer architecture that operates on spatiotemporal patches, compresses visual data into latent codes, and builds on a wide range of prior video generation research, while the article also advertises a DevOps certification program.

AIOpenAISora

0 likes · 8 min read

OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model

NewBeeNLP

Mar 22, 2024 · Artificial Intelligence

Unraveling Sora: How OpenAI Might Build Its Text‑to‑Video Engine

This article provides a step‑by‑step technical analysis of OpenAI’s Sora model, examining its possible overall architecture, video encoder‑decoder design, Spacetime Latent Patch mechanism, transformer‑based diffusion process, training strategies, and long‑term consistency techniques, while grounding each speculation in publicly available reports and related research.

AI analysisSoraTransformer

0 likes · 50 min read

Unraveling Sora: How OpenAI Might Build Its Text‑to‑Video Engine

DataFunTalk

Mar 21, 2024 · Artificial Intelligence

A Detailed Technical Analysis of Sora: Architecture, Key Components, and Potential Implementation

This article provides a comprehensive, easy‑to‑understand breakdown of Sora’s possible architecture—including its visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion model, long‑time consistency strategies, training techniques, and how it supports variable resolution and duration video generation.

AI ArchitectureSoraSpacetime Patch

0 likes · 49 min read

A Detailed Technical Analysis of Sora: Architecture, Key Components, and Potential Implementation

DataFunTalk

Mar 18, 2024 · Artificial Intelligence

High-Fidelity Image-to-Video Generation for E‑commerce Product Motion with AtomoVideo and Noise Rectification

This article presents Alibaba's research on using diffusion‑based AIGC techniques, including a training‑free Noise Rectification module and the AtomoVideo model, to automatically convert static product images into high‑quality, detail‑preserving video motions for e‑commerce advertising.

AIGCAtomoVideoNoise Rectification

0 likes · 15 min read

High-Fidelity Image-to-Video Generation for E‑commerce Product Motion with AtomoVideo and Noise Rectification

Alimama Tech

Mar 14, 2024 · Artificial Intelligence

High-Fidelity Image-to-Video Generation for E-commerce with AtomoVideo and Noise Rectification

Alibaba’s AI team introduced AtomoVideo, a diffusion‑based image‑to‑video generator enhanced by a training‑free Noise Rectification module that adds and corrects controlled noise to eliminate first‑frame errors, enabling merchants to automatically create high‑fidelity 4‑second 720p product videos with strong temporal consistency for e‑commerce advertising.

AIAIGCVideo Generation

0 likes · 10 min read

High-Fidelity Image-to-Video Generation for E-commerce with AtomoVideo and Noise Rectification

DeWu Technology

Mar 11, 2024 · Artificial Intelligence

Understanding OpenAI's Sora Video Generation Model: Diffusion, Transformers, and Latent Space

OpenAI's Sora video generation model uses latent diffusion, a video compression encoder-decoder, tokenizes spatio-temporal patches, processes them with a diffusion‑trained Transformer conditioned on DALL·E‑style text annotations, then decodes to high‑resolution videos up to a minute long.

AILatent DiffusionSora

0 likes · 18 min read

Understanding OpenAI's Sora Video Generation Model: Diffusion, Transformers, and Latent Space

Xiaohongshu Tech REDtech

Feb 27, 2024 · Artificial Intelligence

InstantID: Zero-shot Identity-Preserving Generation in Seconds

InstantID, an open‑source tool released by Xiaohongshu in early 2024, generates multiple stylized portraits that preserve a person’s facial identity from a single reference photo in seconds, eliminating fine‑tuning, large storage needs, and multi‑image requirements while seamlessly working with popular diffusion models like Stable Diffusion 1.5 and SDXL.

AIInstantIDdiffusion model

0 likes · 6 min read

InstantID: Zero-shot Identity-Preserving Generation in Seconds

High Availability Architecture

Feb 22, 2024 · Artificial Intelligence

Understanding OpenAI’s Sora: A Breakthrough Text-to-Video Model

OpenAI’s newly released Sora text‑to‑video model demonstrates unprecedented high‑resolution, long‑duration video generation by encoding videos into latent space, applying diffusion with a transformer conditioned on text, and decoding back to pixels, marking a major leap in AI video synthesis and its potential applications.

AI video generationLatent DiffusionSora

0 likes · 14 min read

Understanding OpenAI’s Sora: A Breakthrough Text-to-Video Model

Architects' Tech Alliance

Feb 22, 2024 · Artificial Intelligence

OpenAI’s Sora: A Breakthrough Text‑to‑Video Generation Model – Capabilities, Architecture, and Research Insights

OpenAI’s Sora model demonstrates unprecedented text‑to‑video generation with up to 60‑second high‑fidelity clips, consistent multi‑character scenes, multi‑camera motion, and world‑simulation abilities, backed by a diffusion‑transformer trained on compressed latent video patches and detailed technical analysis from its accompanying research paper.

AI video generationArtificial IntelligenceOpenAI

0 likes · 11 min read

OpenAI’s Sora: A Breakthrough Text‑to‑Video Generation Model – Capabilities, Architecture, and Research Insights

CSS Magic

Feb 20, 2024 · Artificial Intelligence

OpenAI’s Sora Video Model Is Hyped—But Here Are the Flaws OpenAI Itself Acknowledges

The article walks through OpenAI’s own admission of Sora’s shortcomings—such as unrealistic physics, misplaced spatial details, and erratic object behavior—by showcasing concrete demo failures, additional observations, and technical notes about its diffusion‑based, transformer architecture and metadata embedding.

AI limitationsOpenAISora

0 likes · 7 min read

OpenAI’s Sora Video Model Is Hyped—But Here Are the Flaws OpenAI Itself Acknowledges

DevOps

Feb 18, 2024 · Artificial Intelligence

OpenAI's Sora: In‑Depth Analysis of the First Text‑to‑Video Model and Its Technical Foundations

OpenAI's Sora, the first text‑to‑video model, demonstrates unprecedented video quality and length by leveraging massive high‑quality training data, novel video‑patch representations, diffusion‑based transformer architecture, and precise subtitle generation, reshaping both AI research and media production.

OpenAISoradiffusion model

0 likes · 9 min read

OpenAI's Sora: In‑Depth Analysis of the First Text‑to‑Video Model and Its Technical Foundations

Architects' Tech Alliance

Feb 18, 2024 · Artificial Intelligence

How OpenAI’s Sora Redefines Video Generation with 3‑D Consistency and World Simulation

OpenAI’s Sora model introduces a diffusion‑transformer approach that generates high‑fidelity, 60‑second videos with consistent 3‑D camera motion, long‑term object persistence, and the ability to simulate interactive digital worlds, backed by a detailed technical report and research paper.

Artificial IntelligenceOpenAISora

0 likes · 9 min read

How OpenAI’s Sora Redefines Video Generation with 3‑D Consistency and World Simulation

Rare Earth Juejin Tech Community

Feb 4, 2024 · Artificial Intelligence

Understanding Stable Diffusion Architecture and Implementing It with the Diffusers Library

This article reviews the evolution from GANs to diffusion models, explains the components of Stable Diffusion—including the CLIP text encoder, VAE, and UNet—and provides step‑by‑step Python code using HuggingFace's Diffusers library to generate images from text prompts.

AI paintingPythonStable Diffusion

0 likes · 12 min read

Understanding Stable Diffusion Architecture and Implementing It with the Diffusers Library

Ximalaya Technology Team

Feb 1, 2024 · Artificial Intelligence

Understanding AI Image Generation: Diffusion Models, CLIP, and Control Techniques

This guide explains how AI image generators such as Stable Diffusion and DALL·E 3 turn text prompts into pictures by using diffusion models, CLIP‑aligned embeddings, and optional controls like negative prompts, fine‑tuned LoRA checkpoints and ControlNet conditioning, highlighting their differences, workflow, and practical customization.

AI image generationCLIPControlNet

0 likes · 18 min read

Understanding AI Image Generation: Diffusion Models, CLIP, and Control Techniques

Xiaohongshu Tech REDtech

Jan 23, 2024 · Artificial Intelligence

Controllable Mind Visual Diffusion Model (CMVDM) for Reconstructing Visual Stimuli from fMRI Signals

The Controllable Mind Visual Diffusion Model (CMVDM) decodes fMRI signals into semantic vectors and silhouette maps, feeds them into a latent diffusion framework with a ControlNet‑style encoder, and reconstructs high‑fidelity images that surpass existing baselines in both structural similarity and semantic accuracy across multiple brain‑imaging datasets.

AIbrain decodingdiffusion model

0 likes · 13 min read

Controllable Mind Visual Diffusion Model (CMVDM) for Reconstructing Visual Stimuli from fMRI Signals

JD Cloud Developers

Aug 17, 2023 · Artificial Intelligence

Master Stable Diffusion: From Prompt Basics to Advanced ControlNet Techniques

This guide walks you through the fundamentals of Stable Diffusion AI painting, covering diffusion theory, prompt crafting, parameter tuning, model selection, high‑resolution upscaling, inpainting, and the powerful ControlNet extension for precise image control.

AI artControlNetdiffusion model

0 likes · 24 min read

Master Stable Diffusion: From Prompt Basics to Advanced ControlNet Techniques

Alibaba Cloud Big Data AI Platform

Jul 13, 2023 · Artificial Intelligence

Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese

Rapid Diffusion introduces a knowledge‑enhanced, high‑speed Chinese text‑to‑image diffusion model with one‑click deployment, achieving superior image quality and up to 1.73× faster inference through FlashAttention and BladeDISC optimizations, and demonstrates strong performance across e‑commerce, traditional painting, and food datasets.

Chinese NLPKnowledge Enhancementdiffusion model

0 likes · 12 min read

Rapid Diffusion: Fast, Domain‑Specific Text‑to‑Image Generation for Chinese

Tencent Cloud Developer

May 25, 2023 · Artificial Intelligence

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting

QQGC, Tencent’s two‑stage text‑to‑image model that separates CLIP‑based Prior mapping from a Stable Diffusion Decoder, leverages T5‑enhanced text embeddings and a suite of efficiency tricks—including FP16, flash attention, ZeRO and GPU‑RDMA—to train over‑2 B‑parameter models on 64 GPUs, achieving state‑of‑the‑art FID and CLIP scores while supporting image variation, semantic img2img, precise CLIP‑vector edits and unsafe‑content filtering, and now powers the company’s Magic Painting Room.

AI paintingCLIP embeddingTraining Acceleration

0 likes · 12 min read

QQGC: A Two-Stage Text-to-Image Model with Prior and Decoder Architectures for Efficient AI Painting

Nightwalker Tech

Apr 13, 2023 · Artificial Intelligence

Fundamentals of AI‑Generated Image Creation: Diffusion Models and Stable Diffusion

This article provides a comprehensive overview of AI‑generated content (AIGC) for image creation, explaining the role of diffusion models, the architecture of Stable Diffusion—including CLIP, UNet, and VAE—and the underlying mathematical concepts such as Markov chains, Langevin dynamics, and Gaussian distributions.

AIAIGCStable Diffusion

0 likes · 31 min read

Fundamentals of AI‑Generated Image Creation: Diffusion Models and Stable Diffusion

58UXD

Mar 7, 2023 · Artificial Intelligence

How Diffusion Models Power AI Image Generation: From Prompts to Pictures

This article explains how modern AI image generators like Midjourney and Stable Diffusion use diffusion models, large training datasets, deep learning, latent spaces, and CLIP to transform textual prompts into high‑quality images, while also discussing the impact on designers and future collaboration opportunities.

CLIPLatent SpaceMidjourney

0 likes · 7 min read

How Diffusion Models Power AI Image Generation: From Prompts to Pictures

Laiye Technology Team

Feb 17, 2023 · Artificial Intelligence

Understanding Diffusion Models, Autoencoders, and VAEs for AIGC with Code Examples

This article introduces the hot AIGC field by explaining diffusion‑based image generation, detailing the principles and mathematics of AutoEncoder and Variational AutoEncoder models, and providing complete TensorFlow code examples to help readers master these generative techniques step by step.

AIGCAutoencoderTensorFlow

0 likes · 8 min read

Understanding Diffusion Models, Autoencoders, and VAEs for AIGC with Code Examples

DeWu Technology

Feb 13, 2023 · Artificial Intelligence

Overview of AI-Generated Art: GAN, Diffusion Models, and Stable Diffusion Applications

The article surveys AI‑generated art, explaining how GANs’ limitations gave way to diffusion models and the open‑source Stable Diffusion platform, which offers text‑to‑image, img2img, inpainting, DreamBooth fine‑tuning, and widespread commercial and DIY deployments via cloud or local WebUI setups.

AI artGaNStable Diffusion

0 likes · 13 min read

Overview of AI-Generated Art: GAN, Diffusion Models, and Stable Diffusion Applications