Tagged articles

164 articles

Page 1 of 2

May 29, 2026 · Artificial Intelligence

WaDi: One‑Step Image Generation with LoRA Meets RoPE

This work analyzes weight‑direction changes in diffusion‑model distillation, proposes a low‑rank rotation adapter (LoRaD) to model those changes, and integrates it into Variational Score Distillation as WaDi, achieving state‑of‑the‑art FID on COCO with only ~10% trainable parameters while generalizing to multiple downstream tasks.

LoRARoPEdiffusion models

0 likes · 20 min read

WaDi: One‑Step Image Generation with LoRA Meets RoPE

Machine Heart

May 29, 2026 · Artificial Intelligence

DiffusionOPD: A New Online Policy Distillation Paradigm for Multi‑Task Diffusion Models

DiffusionOPD introduces a unified on‑policy distillation framework for diffusion models that decouples single‑task online policy exploration from multi‑task capability integration, training expert teachers per task and distilling their skills into a single student model, achieving faster convergence and higher performance across composition, OCR, and aesthetic tasks.

KL divergencePPOdiffusion models

0 likes · 8 min read

DiffusionOPD: A New Online Policy Distillation Paradigm for Multi‑Task Diffusion Models

Machine Heart

May 25, 2026 · Artificial Intelligence

VeRL-Omni: Universal RL Post‑Training for Diffusion and Multimodal Models

VeRL-Omni is an open‑source RL post‑training framework built on verl and vLLM‑Omni that enables efficient, high‑throughput rollout and flexible reward computation for diffusion, AR‑DiT, and unified multimodal generation models, supporting diverse hardware, modular trainers, and demonstrating up to 14% latency reduction and high training throughput in benchmark experiments.

FlowGRPORLVeRL-Omni

0 likes · 9 min read

VeRL-Omni: Universal RL Post‑Training for Diffusion and Multimodal Models

Machine Heart

May 25, 2026 · Artificial Intelligence

Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation

Flow‑OPD introduces on‑policy distillation into flow‑matching diffusion models, using a multi‑teacher online rollout framework and manifold‑anchor regularization to resolve the seesaw effect of single and mixed rewards, achieving superior multi‑task performance and surpassing specialist models in image generation.

Flow-OPDManifold Anchor Regularizationdiffusion models

0 likes · 9 min read

Breaking the Reward Trade‑off: Flow‑OPD Brings Multi‑Teacher OPD to Image Generation

Machine Heart

May 24, 2026 · Artificial Intelligence

How Hallo‑Live Achieves Real‑Time Streaming Text‑Driven Audio‑Video Avatar Generation

Hallo‑Live introduces an asynchronous dual‑stream diffusion framework combined with human‑centric preference‑guided distillation, enabling text‑driven audio‑video avatars to run at 20.38 FPS with 0.94 s latency—over 16× faster and 99.3 % lower latency than the teacher Ovi model while preserving visual quality and lip‑sync.

Hallo-LiveNVIDIA H200asynchronous dual-stream diffusion

0 likes · 9 min read

Machine Heart

May 21, 2026 · Artificial Intelligence

RAEv2: How a Simple Extra Operation Makes Image Generation Train Ten Times Faster

The RAEv2 framework replaces traditional VAEs by summing multiple layers of pretrained vision encoders, combines RAE with REPA for complementary semantic and spatial gains, and leverages free guidance, achieving up to ten‑fold faster convergence, higher image quality, and lower compute on ImageNet‑256 diffusion training.

RAEv2Representation AutoencoderTraining efficiency

0 likes · 11 min read

RAEv2: How a Simple Extra Operation Makes Image Generation Train Ten Times Faster

Machine Heart

May 15, 2026 · Artificial Intelligence

D-OPSD: On‑Policy Self‑Distillation Lets Few‑Step Diffusion Models Learn While Running

D-OPSD presents the first online self‑distillation framework for step‑distilled diffusion models, allowing them to continuously fine‑tune with only image‑text pairs, retain their fast few‑step sampling, and acquire new concepts, styles, or domain preferences without reward models.

LoRAdiffusion modelsfew-step sampling

0 likes · 10 min read

D-OPSD: On‑Policy Self‑Distillation Lets Few‑Step Diffusion Models Learn While Running

Machine Heart

May 11, 2026 · Artificial Intelligence

UniVidX Sets New SOTA on Multiple Video Tasks – A Unified Multimodal Framework Presented at SIGGRAPH 2026

UniVidX, a unified multimodal framework for video generation and understanding accepted at SIGGRAPH 2026, reformulates diverse video graphics tasks as conditional generation, achieving or surpassing state‑of‑the‑art performance while demonstrating strong data efficiency and cross‑domain generalization.

SIGGRAPH 2026UniVidXdata efficiency

0 likes · 10 min read

UniVidX Sets New SOTA on Multiple Video Tasks – A Unified Multimodal Framework Presented at SIGGRAPH 2026

Machine Heart

May 8, 2026 · Artificial Intelligence

Omni2Sound Beats Multi-Modal Audio ‘Generalist’ Dilemma via Data Alignment

Omni2Sound tackles the long‑standing “generalist” dilemma of unified audio generation by constructing a high‑quality V‑T‑A dataset (SoundAtlas), employing a three‑stage progressive training pipeline, and using a simple Diffusion Transformer backbone, ultimately achieving state‑of‑the‑art performance on T2A, V2A and VT2A tasks and strong robustness on off‑screen scenarios.

Audio GenerationData AlignmentMultimodal Learning

0 likes · 16 min read

Omni2Sound Beats Multi-Modal Audio ‘Generalist’ Dilemma via Data Alignment

Machine Learning Algorithms & Natural Language Processing

Apr 27, 2026 · Artificial Intelligence

From Parameter Tuning to Control: CFG‑Ctrl Boosts Stability and Precision in Text‑to‑Image Generation

The paper introduces CFG‑Ctrl, a control‑theoretic redesign of classifier‑free diffusion guidance that treats the generation process as a dynamic system, achieving more stable and accurate text‑to‑image results across multiple model scales and evaluation metrics.

CFG-CtrlClassifier-Free Guidancecontrol theory

0 likes · 15 min read

From Parameter Tuning to Control: CFG‑Ctrl Boosts Stability and Precision in Text‑to‑Image Generation

Kuaishou Tech

Apr 24, 2026 · Artificial Intelligence

ICLR 2026: Kuaishou Tech Team’s Cutting‑Edge AI Research Highlights

This article reviews eight Kuaishou‑authored papers accepted at ICLR 2026, summarizing their problem statements, novel methods such as front‑door causal attribution, visual table retrieval, denoising rerankers, difficulty‑adaptive reasoning, diffusion code infilling, generative ordinal regression, multimodal video retrieval, e‑commerce dialogue benchmarks, and a new LLM creativity evaluator, together with reported experimental gains.

Artificial IntelligenceICLR 2026Kuaishou

0 likes · 19 min read

ICLR 2026: Kuaishou Tech Team’s Cutting‑Edge AI Research Highlights

SuanNi

Apr 21, 2026 · Artificial Intelligence

Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights

This article analyzes the rapid evolution of multimodal video generation models from separated visual‑audio pipelines to unified diffusion Transformers, detailing VAE compression, MoE scaling, cross‑modal alignment techniques, comprehensive evaluation metrics, real‑world applications, and the remaining technical challenges.

Large ModelsVideo Generationaudio-visual alignment

0 likes · 15 min read

Why AI Video Generation Is Leaving the Silent Era: Architecture, Alignment, and Evaluation Insights

AI Explorer

Apr 16, 2026 · Artificial Intelligence

AI Tech Daily: Top AI Research and Industry Updates on April 16 2026

This roundup highlights recent AI breakthroughs such as NVIDIA‑MIT’s Sol‑RL framework for faster diffusion model training, Peking University’s CPL++ visual localization improvement, DeepMind’s TIPSv2 for image recognition, Boston Dynamics Spot’s AI upgrade, Anthropic’s safety paper, a major MCP protocol vulnerability, OpenAI’s GPT‑5.4 release, and the shifting AI video landscape.

AIAI safetyMachine Learning

0 likes · 5 min read

AI Tech Daily: Top AI Research and Industry Updates on April 16 2026

AI Explorer

Apr 16, 2026 · Artificial Intelligence

How NVIDIA, HKU, and MIT’s Sol‑RL Framework Supercharges Diffusion Model Training

NVIDIA, Hong Kong University, and MIT introduced the Sol‑RL framework, which uses reinforcement‑learning‑guided sampling to cut diffusion model training time by several‑fold without sacrificing image quality, potentially lowering entry barriers for small teams and shifting the AIGC industry toward an efficiency‑driven competition.

AIGCNVIDIASol-RL

0 likes · 6 min read

How NVIDIA, HKU, and MIT’s Sol‑RL Framework Supercharges Diffusion Model Training

Machine Heart

Apr 16, 2026 · Artificial Intelligence

Achieving 4.6× Faster Diffusion Model Training with FP4‑BF16 Dual‑Track Parallelism (Sol‑RL)

Sol‑RL, a framework from NVIDIA, Hong Kong University and MIT, integrates NVFP4 inference for large‑scale rollout exploration and BF16 precision for high‑fidelity regeneration, delivering up to 4.64× faster convergence at equivalent reward levels while preserving BF16 training fidelity across SANA, FLUX.1 and SD3.5‑L models.

BF16FP4GPU optimization

0 likes · 9 min read

Achieving 4.6× Faster Diffusion Model Training with FP4‑BF16 Dual‑Track Parallelism (Sol‑RL)

SuanNi

Apr 12, 2026 · Artificial Intelligence

How TDM‑R1 Achieves 4‑Step Image Generation that Beats 80‑Step Models

Researchers from HKUST, CUHK and XiaoHongShu introduced TDM‑R1, a reinforcement‑learning‑based method that enables 4‑step diffusion image generation to surpass 80‑step models in speed, fidelity, and complex instruction adherence, as demonstrated on the GenEval benchmark and multiple quality metrics.

AI image synthesisbenchmarkingdiffusion models

0 likes · 9 min read

How TDM‑R1 Achieves 4‑Step Image Generation that Beats 80‑Step Models

Bighead's Algorithm Notes

Apr 9, 2026 · Artificial Intelligence

WSDM2026 Quantitative Research Papers: Summaries and Insights

This article presents concise summaries of three recent AI‑driven finance papers—Diffolio’s diffusion‑based risk‑aware portfolio optimization, STORM’s dual‑vector‑quantized VAE factor model, and AutoHypo‑Fin’s autonomous web‑mined hypothesis generation—highlighting their motivations, methods, and experimental gains.

AI for financeVQ-VAEdiffusion models

0 likes · 9 min read

WSDM2026 Quantitative Research Papers: Summaries and Insights

HyperAI Super Neural

Apr 7, 2026 · Artificial Intelligence

MIT’s DRiffusion Achieves 1.4–3.7× Faster Diffusion Sampling via Draft‑and‑Refine Parallelism

MIT researchers introduce DRiffusion, a draft‑and‑refine parallel framework that uncovers intrinsic parallelism in diffusion models, delivering 1.4–3.7× speedup on three GPUs while preserving near‑lossless image quality across Stable Diffusion 2.1, SDXL and SD3 evaluated on MS‑COCO.

AI accelerationDRiffusionMS-COCO

0 likes · 14 min read

MIT’s DRiffusion Achieves 1.4–3.7× Faster Diffusion Sampling via Draft‑and‑Refine Parallelism

vivo Internet Technology

Apr 1, 2026 · Artificial Intelligence

Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation

This article introduces C²FG, a training‑free, plug‑and‑play time‑adaptive exponential control function that replaces the fixed classifier‑free guidance scale, theoretically justifies its superiority with score discrepancy bounds, and demonstrates significant FID and IS improvements across multiple diffusion architectures on ImageNet.

CVPR 2026Classifier-Free GuidancePlug-and-Play

0 likes · 7 min read

Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation

Bighead's Algorithm Notes

Mar 31, 2026 · Artificial Intelligence

Top AI-Driven Quantitative Finance Papers from AAAI 2026

This article curates and summarizes recent AI research papers presented at AAAI 2026 that advance quantitative finance, covering controllable market generation, LLM‑powered alpha factor mining, risk‑aware multi‑agent portfolio management, foundation models for market data, and reinforcement‑learning trading policies.

AIFinancial Market SimulationMeta Learning

0 likes · 12 min read

Top AI-Driven Quantitative Finance Papers from AAAI 2026

PaperAgent

Mar 28, 2026 · Artificial Intelligence

How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation

The ACCORD framework formalizes the concept‑coupling issue in text‑to‑image diffusion models as a statistical dependency problem and resolves it with two plug‑and‑play regularization losses, dramatically improving fidelity and text control without altering model architecture.

ACCORDAI researchconcept coupling

0 likes · 7 min read

How ACCORD Breaks Concept Coupling in Custom Text‑to‑Image Generation

HyperAI Super Neural

Mar 25, 2026 · Artificial Intelligence

Low‑Barrier Deployment of NVIDIA’s Latest Physical AI Models for Humanoid Robots, Motion Generation, and Diffusion Fine‑Tuning

The article introduces NVIDIA’s Physical AI suite announced at GTC 2026—including Isaac GR00T, SOMA‑X, Kimodo, and FDFO—explains each model’s architecture and purpose, and provides one‑click online tutorials that let developers experiment with humanoid robotics, human‑body modeling, motion generation, and diffusion model fine‑tuning at minimal cost.

Embodied AIFDFOIsaac GR00T

0 likes · 8 min read

Low‑Barrier Deployment of NVIDIA’s Latest Physical AI Models for Humanoid Robots, Motion Generation, and Diffusion Fine‑Tuning

Bighead's Algorithm Notes

Mar 22, 2026 · Artificial Intelligence

DigMA: Controllable Generation of Financial Market Data – A Deep Dive

This article reviews the DigMA model, which uses a diffusion‑guided meta‑agent to generate high‑fidelity, controllable order‑flow data for financial markets, details its problem formulation, architecture, training on Chinese stock datasets, extensive experiments—including reinforcement‑learning‑based high‑frequency trading evaluation—and demonstrates its superior accuracy and ultra‑low latency generation.

Financial Market SimulationMeta‑Agentcontrollable generation

0 likes · 16 min read

DigMA: Controllable Generation of Financial Market Data – A Deep Dive

Bighead's Algorithm Notes

Mar 13, 2026 · Artificial Intelligence

Paper Reading: STABLE – A Robust Portfolio Allocation Method Using Conditional Diffusion Estimates

The STABLE framework integrates a conditional diffusion generator with a Black‑Litterman mean‑variance optimizer to produce style‑aware return forecasts and risk‑aware portfolio weights, achieving up to a 122.9% Sharpe‑ratio boost, lower drawdowns, and a 15.7% MSE reduction across major equity markets.

Black-LittermanFinancial AIconditional diffusion

0 likes · 17 min read

Paper Reading: STABLE – A Robust Portfolio Allocation Method Using Conditional Diffusion Estimates

AIWalker

Mar 10, 2026 · Artificial Intelligence

MIGM-Shortcut: Learning Controlled Latent Dynamics to Speed Up Masked Image Generation

The paper introduces MIGM-Shortcut, a self‑supervised method that learns controlled latent‑state dynamics to bypass redundant bidirectional attention in Masked Image Generation Models, achieving over 4× speed‑up on state‑of‑the‑art multimodal diffusion models like Lumina‑DiMOO while preserving image quality.

AIMIGMdiffusion models

0 likes · 8 min read

MIGM-Shortcut: Learning Controlled Latent Dynamics to Speed Up Masked Image Generation

AI Explorer

Mar 9, 2026 · Industry Insights

AI Daily Highlights March 9 2026: Breakthrough Math Solver, Embodied AGI, Chip Hacks, and New Models

On March 9 2026, AI breakthroughs ranged from Claude Opus solving a 30‑year math problem and Tesla unveiling embodied AGI to Apple’s M4 chip limit being cracked, a new 30B open‑source model surpassing Gemini, and advances in diffusion and multimodal research, reflecting rapid industry evolution.

AIApple M4Claude Opus

0 likes · 6 min read

AI Daily Highlights March 9 2026: Breakthrough Math Solver, Embodied AGI, Chip Hacks, and New Models

SuanNi

Feb 23, 2026 · Artificial Intelligence

How FireRed-Image-Edit Sets New Standards for AI-Powered Image Editing

FireRed-Image-Edit, an open‑source instruction‑driven diffusion model, combines massive high‑quality data, a dual‑stream multimodal architecture, progressive training, and a comprehensive multi‑dimensional benchmark to achieve unprecedented pixel‑level control and human‑like editing performance across diverse visual tasks.

AITraining Strategiesdata engineering

0 likes · 12 min read

How FireRed-Image-Edit Sets New Standards for AI-Powered Image Editing

Machine Learning Algorithms & Natural Language Processing

Feb 14, 2026 · Artificial Intelligence

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

The new Latent Forcing technique from Fei‑Fei Li’s team reorders the diffusion trajectory, first generating a latent structural sketch and then refining pixel details, which restores efficiency of latent‑space models while preserving 100 % pixel fidelity, achieving state‑of‑the‑art FID scores on ImageNet‑256.

AI researchImageNetdiffusion models

0 likes · 6 min read

Latent Forcing: Reordering Diffusion Steps Boosts Pixel‑Level Image Quality

AI Frontier Lectures

Feb 3, 2026 · Artificial Intelligence

Pixel Mean Flow: One‑Step Diffusion Beats Multi‑Step Models on ImageNet

The Pixel Mean Flow (pMF) method eliminates multi‑step sampling and latent‑space encoding, generating high‑quality images in a single step and achieving state‑of‑the‑art FID scores on ImageNet while drastically reducing computational cost.

ImageNetdiffusion modelsperceptual loss

0 likes · 7 min read

Pixel Mean Flow: One‑Step Diffusion Beats Multi‑Step Models on ImageNet

Design Hub

Jan 17, 2026 · Artificial Intelligence

FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts

The article reviews Black Forest Labs' FLUX.2 Klein model, highlighting its sub‑second 1024×1024 image generation, low‑VRAM requirements, four‑step inference speedups, and competitive quality versus SD3 and Midjourney V6, while also sharing Midjourney‑style prompt examples for creative design.

AI image generationFLUX.2GPU Acceleration

0 likes · 8 min read

FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts

Design Hub

Dec 22, 2025 · Artificial Intelligence

Open‑Source AI Photoshop: Alibaba’s Qwen‑Image‑Layered Enables One‑Click Smart Layering

Alibaba’s Qwen‑Image‑Layered model, now fully open‑source, automatically separates a single image into editable RGBA layers using diffusion, offering Photoshop‑level editing, prompt‑controlled layer counts, and deep decomposition, with applications ranging from PPT de‑construction to game asset extraction, while noting limitations on realistic photos.

AI image segmentationComfyUIFigma plugin

0 likes · 8 min read

Open‑Source AI Photoshop: Alibaba’s Qwen‑Image‑Layered Enables One‑Click Smart Layering

Data Party THU

Dec 18, 2025 · Artificial Intelligence

How Diffusion Models and Transformers Power the Next Generation of AI Video Generation

AI video generation now turns textual prompts into high‑quality clips using diffusion models and transformer‑based architectures; this article explains the underlying mathematics, training objectives, spatio‑temporal encoding, breakthroughs like consistent motion and physical realism, and discusses the technology’s opportunities and inherent risks.

AI video generationSpatio-temporal modelingTransformers

0 likes · 11 min read

How Diffusion Models and Transformers Power the Next Generation of AI Video Generation

Alibaba Cloud Developer

Dec 18, 2025 · Artificial Intelligence

How to Build a Real‑Time AI‑Powered Anime‑Style Video Generator for Social Apps

This technical report details the end‑to‑end workflow for integrating an AIGC video generation module into a social app, covering requirement analysis, model and hardware selection, dataset construction, LoRA and full‑parameter training, multiple acceleration techniques such as Sage Attention, TeaCache, XDiT, gradient‑checkpointing offload, tiled VAE, and quantization, followed by extensive performance evaluation and metric‑based ranking of the final models.

AI video generationLoRA fine-tuningSage Attention

0 likes · 38 min read

How to Build a Real‑Time AI‑Powered Anime‑Style Video Generator for Social Apps

Kuaishou Tech

Dec 3, 2025 · Artificial Intelligence

Can Diffusion Models Be Their Own Reward Model? Latent Reward Modeling & Step-Level Preference Optimization

This article presents a novel paradigm—Latent Reward Model (LRM) and Latent Preference Optimization (LPO)—that repurposes diffusion models as noise‑aware latent reward models for step‑level preference optimization, addressing the shortcomings of pixel‑level reward models, introducing multi‑preference consistent filtering, and demonstrating significant performance and efficiency gains on benchmarks such as PickScore and T2I‑CompBench++.

AI alignmentPreference Optimizationdiffusion models

0 likes · 9 min read

Can Diffusion Models Be Their Own Reward Model? Latent Reward Modeling & Step-Level Preference Optimization

HyperAI Super Neural

Nov 19, 2025 · Artificial Intelligence

LocDiff: Achieving Global-Scale Precise Image Geolocation Without Grids or Reference Libraries

The LocDiff framework introduces a spherical‑harmonics Dirac‑delta encoding and a conditional Siren‑UNet diffusion model that enables accurate worldwide image geolocation without relying on predefined grids or external image libraries, outperforming prior methods in precision, generalization, and computational efficiency.

AI researchLocDiffdiffusion models

0 likes · 16 min read

LocDiff: Achieving Global-Scale Precise Image Geolocation Without Grids or Reference Libraries

Bighead's Algorithm Notes

Nov 15, 2025 · Artificial Intelligence

Quantitative Finance Paper Digest: Nov 8‑14 2025 Highlights

This article summarizes five recent arXiv papers that apply advanced AI techniques such as diffusion models, hierarchical attention, and stochastic differential equations to multivariate financial time‑series forecasting, portfolio selection, volatility surface generation, and gold‑futures alpha strategies, presenting their core methods and experimental results.

diffusion modelsequilibrium portfoliofinancial time series

0 likes · 10 min read

Quantitative Finance Paper Digest: Nov 8‑14 2025 Highlights

Kuaishou Tech

Nov 13, 2025 · Artificial Intelligence

Unlocking Unusual Concept Combinations in Generative AI with IMBA Loss

The paper identifies imbalanced concept distributions as the main obstacle to arbitrary concept‑combination in text‑to‑image/video generation, proposes the token‑level IMBA Distance and a lightweight IMBA Loss that adaptively re‑weights training tokens, and demonstrates through extensive experiments and a new Inert‑CompBench benchmark that this loss dramatically improves compositional ability without extra data.

IMBA Lossbenchmarkconcept combination

0 likes · 9 min read

Unlocking Unusual Concept Combinations in Generative AI with IMBA Loss

AI Frontier Lectures

Nov 4, 2025 · Artificial Intelligence

How DiffPathV2 Achieves Zero‑Shot Image Anomaly Detection with 94.9% AUROC

This article breaks down the ICCV 2025 paper "Zero‑Shot Image Anomaly Detection Using Generative Foundation Models," explaining how DiffPathV2 leverages diffusion model denoising trajectories, six‑dimensional score errors, and SSIM weighting to detect out‑of‑distribution images without any task‑specific training, achieving state‑of‑the‑art AUROC scores across multiple benchmarks.

AUROCDiffPathV2SSIM

0 likes · 10 min read

How DiffPathV2 Achieves Zero‑Shot Image Anomaly Detection with 94.9% AUROC

HyperAI Super Neural

Oct 20, 2025 · Artificial Intelligence

How Nvidia’s ERDM Model Beats EDM in Long‑Term Weather Forecasting (NeurIPS 2025)

The paper introduces ERDM, an enhanced rolling diffusion model that integrates progressive noise scheduling and time‑loss weighting from EDM, demonstrates superior CRPS scores on Navier‑Stokes and ERA5 mid‑term weather forecasts, and achieves comparable accuracy with far lower computational cost.

AIERDMNeurIPS 2025

0 likes · 14 min read

How Nvidia’s ERDM Model Beats EDM in Long‑Term Weather Forecasting (NeurIPS 2025)

Data Party THU

Oct 15, 2025 · Artificial Intelligence

Designing Safe, Sample-Efficient, and Robust Reinforcement Learning for Ranking and Diffusion Models

This paper proposes a reinforcement‑learning framework that simultaneously ensures safety, sample efficiency, and robustness, applying a contextual‑bandit perspective to ranking/recommendation systems and text‑to‑image diffusion models, and introduces novel algorithms for safe deployment, variance‑reduced off‑policy estimation, and a LOOP method for generative RL.

RobustnessSafetycontextual bandits

0 likes · 5 min read

Designing Safe, Sample-Efficient, and Robust Reinforcement Learning for Ranking and Diffusion Models

AI Algorithm Path

Oct 15, 2025 · Artificial Intelligence

Building a Flow Matching Model from Scratch: Theory Explained

This article walks through the theory behind flow‑matching generative models, contrasting them with diffusion models, detailing the velocity‑field formulation, training objective, and sampling procedure, and includes visual illustrations of the core concepts.

Generative ModelsODEdiffusion models

0 likes · 8 min read

Building a Flow Matching Model from Scratch: Theory Explained

Data Party THU

Oct 13, 2025 · Artificial Intelligence

How BranchGRPO Accelerates and Stabilizes Diffusion Model Alignment

BranchGRPO introduces a tree‑structured branching, reward‑fusion, and lightweight pruning framework that dramatically speeds up diffusion and flow model training while delivering denser, more stable reward signals, achieving up to five‑fold faster convergence and higher alignment scores on image and video generation benchmarks.

BranchGRPOEfficiencyRLHF

0 likes · 10 min read

How BranchGRPO Accelerates and Stabilizes Diffusion Model Alignment

AI Algorithm Path

Oct 12, 2025 · Artificial Intelligence

Flow Matching vs Diffusion Models: Key Differences and Connections

This technical article provides a comprehensive comparison of diffusion models and flow matching, covering their intuitive explanations, underlying mathematics, training objectives, sampling efficiency, theoretical guarantees, practical examples, and code implementations to illustrate how each generative approach works.

diffusion modelsflow matchinggenerative AI

0 likes · 12 min read

Flow Matching vs Diffusion Models: Key Differences and Connections

Data Party THU

Oct 6, 2025 · Artificial Intelligence

Why Data, Not Architecture, Drives Locality in Diffusion Models

A recent MIT‑Toyota study shows that the locality observed in image diffusion models emerges from the statistical structure of training data rather than from architectural biases, and a simple linear denoiser can replicate this behavior, reshaping how we think about model design.

Data StatisticsU-Netdiffusion models

0 likes · 10 min read

Why Data, Not Architecture, Drives Locality in Diffusion Models

Amap Tech

Oct 2, 2025 · Artificial Intelligence

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

FantasyWorld introduces a geometry‑enhanced framework that augments a frozen video diffusion model with a trainable geometry branch, enabling simultaneous video representation and implicit 3D field generation, achieving spatially consistent, high‑quality virtual worlds and outperforming recent baselines in multi‑view coherence and geometric fidelity.

3D modelingVideo Generationcomputer vision

0 likes · 11 min read

How FantasyWorld Unifies Video Generation and 3D Geometry for Consistent Virtual Worlds

AI2ML AI to Machine Learning

Sep 30, 2025 · Artificial Intelligence

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

The article surveys the evolution of video generation models—from early GANs and DCGAN to diffusion‑based approaches like Stable Diffusion and DiT—highlighting how stability, high quality, massive compute, and multimodal data pipelines are shaping the current and future paths of dynamic multimodal video generation.

Latent DiffusionStable DiffusionTransformer

0 likes · 7 min read

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

Bighead's Algorithm Notes

Sep 27, 2025 · Artificial Intelligence

Weekly Time-Series Paper Digest (Sep 20‑26, 2025)

This digest summarizes three recent arXiv papers that propose novel diffusion‑based generation, a channel‑independent convolution for multivariate forecasting, and a style‑guided diffusion framework, each demonstrating improved realism, coherence, and diversity of synthetic time‑series data through extensive experiments.

DS-DiffusionIConvMMD loss

0 likes · 8 min read

Weekly Time-Series Paper Digest (Sep 20‑26, 2025)

Kuaishou Large Model

Sep 24, 2025 · Artificial Intelligence

How Generative Reinforcement Learning is Revolutionizing Real-Time Bidding

The article explains the core challenges of real‑time bidding, reviews Kuaishou's evolution from PID to MPC to reinforcement learning, and introduces generative reinforcement‑learning methods (GAVE and CBD) that combine decision transformers or diffusion models with value‑guided exploration and score‑based RTG, achieving significant offline and online performance gains.

advertising algorithmsdiffusion modelsgenerative reinforcement learning

0 likes · 15 min read

How Generative Reinforcement Learning is Revolutionizing Real-Time Bidding

AIWalker

Sep 17, 2025 · Artificial Intelligence

InfGen Enables Arbitrary-Resolution Image Generation: 4K Images in 7 Seconds, 10× Faster

InfGen introduces a resolution‑agnostic generation paradigm that replaces the VAE decoder in diffusion models, allowing any‑size image synthesis with up to ten‑fold speed gains, achieving 4K outputs in under 7 seconds while improving visual quality.

InfGenarbitrary resolutiondiffusion models

0 likes · 15 min read

InfGen Enables Arbitrary-Resolution Image Generation: 4K Images in 7 Seconds, 10× Faster

Bighead's Algorithm Notes

Sep 12, 2025 · Artificial Intelligence

AI for Finance: Quantum Asset Clustering, Causal Market Troughs, Multimodal Forecasting, Diffusion SDEs

This article summarizes four recent AI‑driven finance papers: a quantum‑annealing asset clustering algorithm, a causal machine‑learning model for predicting market troughs, a multimodal large‑model approach to financial time‑series forecasting, and a diffusion‑model method for generating stochastic‑differential‑equation sample paths.

Asset ClusteringCausal MLFinance

0 likes · 7 min read

AI for Finance: Quantum Asset Clustering, Causal Market Troughs, Multimodal Forecasting, Diffusion SDEs

Sohu Smart Platform Tech Team

Sep 12, 2025 · Artificial Intelligence

How AI is Revolutionizing Video Creation: From Text‑to‑Video to Real‑Time Editing

This article systematically explores the technical evolution, core principles, and emerging innovations of AI‑generated video, covering generation methods, GAN and diffusion models, transformer‑based DiT architectures, efficiency‑boosting NCR, audio‑visual V2A integration, and real‑world applications across media, education, and commerce.

AI video generationGaNNCR

0 likes · 25 min read

How AI is Revolutionizing Video Creation: From Text‑to‑Video to Real‑Time Editing

Bighead's Algorithm Notes

Sep 5, 2025 · Artificial Intelligence

Weekly Quantitative Finance Paper Digest (Aug 30 – Sep 5, 2025)

This digest reviews four recent AI‑driven finance papers: a robust MCVaR portfolio optimizer with ellipsoidal support and RKHS uncertainty, a PPO‑based adaptive weighting system for LLM‑generated alphas, an empirical comparison of price‑based, GICS‑based, and LLM‑embedding stock clustering, and a diffusion‑model approach that generates future financial chart images from current charts and text prompts.

Quantitative Financediffusion modelslarge language models

0 likes · 9 min read

Weekly Quantitative Finance Paper Digest (Aug 30 – Sep 5, 2025)

Data Party THU

Sep 3, 2025 · Artificial Intelligence

Exploring Multimodal Generative AI: A Tsinghua Tutorial at IJCAI 2025

This article introduces a 1.5‑hour tutorial presented by Tsinghua researchers at IJCAI 2025, covering the latest advances in multimodal generative AI, including multimodal large language models, diffusion models, post‑training generalization techniques, and unified understanding‑generation frameworks.

Generative ModelsIJCAI 2025Tutorial

0 likes · 5 min read

Exploring Multimodal Generative AI: A Tsinghua Tutorial at IJCAI 2025

AIWalker

Aug 19, 2025 · Artificial Intelligence

DynamicFace: Controllable High‑Quality Face Swapping for Images and Video

DynamicFace introduces a diffusion‑based framework that explicitly decouples identity, pose, expression, illumination and background using composable 3D facial priors, achieving superior identity preservation, motion consistency and visual fidelity in both image and video face‑swapping tasks.

3D facial priorscontrollable generationdiffusion models

0 likes · 13 min read

DynamicFace: Controllable High‑Quality Face Swapping for Images and Video

Xiaohongshu Tech REDtech

Aug 19, 2025 · Artificial Intelligence

How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality

The paper introduces Single Trajectory Distillation (STD), a novel training framework that aligns full PF‑ODE trajectories from a fixed noisy state, uses a Trajectory Bank to cut training cost, and adds an Asymmetric Adversarial Loss to markedly improve style consistency and aesthetic quality while accelerating image and video style‑transfer diffusion models.

AI accelerationStyle Transferconsistency models

0 likes · 14 min read

How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality

Data Party THU

Aug 15, 2025 · Artificial Intelligence

What’s Next for Visual Reinforcement Learning? A Comprehensive 2024‑2025 Survey

This article provides a critical, up‑to‑date overview of visual reinforcement learning, formalizes the problem, traces policy‑optimization evolution, categorizes over 200 recent works into four pillars, analyzes algorithms, reward design, benchmarks, and highlights open challenges and future research directions.

RLHFdiffusion modelsmultimodal AI

0 likes · 7 min read

What’s Next for Visual Reinforcement Learning? A Comprehensive 2024‑2025 Survey

Baidu Geek Talk

Aug 11, 2025 · Artificial Intelligence

FLUX-Lightning Slashes Diffusion Inference to 4 Steps, Doubling Speed

FLUX-Lightning, introduced by PaddleMIX, combines phased consistency distillation, adversarial learning, distribution‑matching distillation, and reflow loss to reduce diffusion model inference to just four steps while preserving image quality, and leverages the CINN compiler to achieve over 30% speed gains on A800 GPUs, surpassing existing SOTA acceleration methods.

AI inferenceCINNFlux

0 likes · 21 min read

FLUX-Lightning Slashes Diffusion Inference to 4 Steps, Doubling Speed

Data Party THU

Aug 9, 2025 · Artificial Intelligence

How SADA Boosts Diffusion Model Sampling Speed by Up to 1.8× Without Losing Quality

The paper introduces SADA (Stability‑guided Adaptive Diffusion Acceleration), a novel paradigm that dynamically allocates sparsity per token using a unified stability criterion, enabling efficient ODE‑based sampling for diffusion and flow‑matching models, achieving up to 1.8× speedup with negligible fidelity loss across SD‑2, SDXL, Flux, ControlNet and MusicLDM.

ODEdiffusion modelsgenerative AI

0 likes · 5 min read

How SADA Boosts Diffusion Model Sampling Speed by Up to 1.8× Without Losing Quality

AI Frontier Lectures

Jul 30, 2025 · Artificial Intelligence

DualReal: Seamless Identity and Motion Customization for Video Generation

DualReal introduces a novel adaptive joint training framework that simultaneously customizes subject identity and motion dynamics in video generation, overcoming the conflicts of traditional isolated approaches by using a dual-domain perception adapter and stage-fusion controller, achieving up to 31.8% improvement on CLIP‑I and DINO‑I metrics.

Video Generationdiffusion modelsdual-domain adaptation

0 likes · 13 min read

DualReal: Seamless Identity and Motion Customization for Video Generation

Alimama Tech

Jul 23, 2025 · Artificial Intelligence

How Differentiable Solver Search Accelerates Diffusion Model Sampling

This article presents a differentiable solver search method that quickly finds high‑quality sampling paths for diffusion models, demonstrating significant FID improvements across Rectified‑Flow, DDPM/VP, and text‑to‑image models while requiring no model parameter changes.

AIdifferentiable solverdiffusion models

0 likes · 20 min read

How Differentiable Solver Search Accelerates Diffusion Model Sampling

Kuaishou Tech

Jul 22, 2025 · Artificial Intelligence

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

Orthus, a new unified multimodal model presented at ICML 2025, leverages an autoregressive Transformer backbone with separate language and diffusion heads to enable lossless image‑text interleaved generation, outperforming existing models on both understanding and generation benchmarks while remaining computationally efficient.

AI researchautoregressive transformerdiffusion models

0 likes · 11 min read

How Orthus Achieves Lossless Multimodal Generation with a Unified Autoregressive Transformer

AI Frontier Lectures

Jul 13, 2025 · Artificial Intelligence

How HarmoniCa Boosts Diffusion Model Speed with Joint Training‑Inference Caching

HarmoniCa, a new feature‑caching framework co‑designed by HKUST, Beihang University, and SenseTime, tackles diffusion model inference bottlenecks by aligning training and inference through Step‑Wise Denoising Training and an Image Error Proxy Objective, achieving up to 2× speedup while preserving image quality.

Performance Accelerationdiffusion modelsfeature caching

0 likes · 9 min read

How HarmoniCa Boosts Diffusion Model Speed with Joint Training‑Inference Caching

Amap Tech

Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

The USP framework introduces masked latent modeling within a VAE space to pre‑train ViT encoders, enabling seamless weight transfer to both image classification, segmentation, and diffusion‑based generation tasks, dramatically speeding up DiT and SiT models while preserving strong visual representations.

VAEViTdiffusion models

0 likes · 13 min read

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

Amap Tech

Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding

The USP framework introduces masked latent modeling within a VAE space to pretrain ViT encoders, enabling seamless weight transfer to both image classification and diffusion‑based generation tasks, dramatically accelerating training while preserving strong performance across multiple benchmarks.

Vision Transformerdiffusion modelsimage generation

0 likes · 10 min read

Unified Self‑Supervised Pretraining Boosts Image Generation and Understanding

Amap Tech

Jul 9, 2025 · Artificial Intelligence

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

This article introduces VMBench, the first perception‑aligned video motion generation benchmark that defines a five‑dimensional metric suite and a meta‑guided prompt generation pipeline, and presents LD‑RPS, a zero‑shot unified image restoration framework based on latent diffusion recurrent posterior sampling, together with extensive experiments validating both systems.

Video Generationbenchmarkdiffusion models

0 likes · 14 min read

VMBench: Perception-Aligned Motion Benchmark & LD‑RPS Zero‑Shot Restoration

Amap Tech

Jul 9, 2025 · Artificial Intelligence

Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS

This article introduces VMBench, a perception‑aligned video motion generation benchmark with a five‑dimensional metric suite and meta‑guided prompt generation, and LD‑RPS, a zero‑shot unified image restoration framework using latent diffusion and recurrent posterior sampling, detailing their motivations, innovations, experiments, and future directions.

AI researchVideo Generationdiffusion models

0 likes · 14 min read

Bridging Human Perception and Video Motion Generation: VMBench & LD‑RPS

Kuaishou Large Model

Jul 3, 2025 · Artificial Intelligence

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

The EvoSearch method introduced by HKUST and Kuaishou’s KuaLing team leverages test‑time scaling to dramatically improve diffusion‑based image and video generation without training, using evolutionary search along the denoising trajectory, achieving state‑of‑the‑art results on SD2.1, Flux‑1‑dev and other models.

Evolutionary SearchTest-Time ScalingVideo Generation

0 likes · 8 min read

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

Tencent Technical Engineering

Jul 3, 2025 · Artificial Intelligence

Winning the NTIRE 2025 UGC Video Enhancement Challenge: A Progressive AI Framework

Tencent’s TEG team secured first place in the NTIRE 2025 UGC Video Enhancement competition by introducing a progressive, three‑stage AI framework that decomposes enhancement tasks into expert models for color correction, denoising, and temporal stability, incorporates advanced loss functions, extensive hardware‑level optimizations, INT8 quantization techniques, and outlines future diffusion‑based generative enhancements.

AIdiffusion modelshardware optimization

0 likes · 17 min read

Winning the NTIRE 2025 UGC Video Enhancement Challenge: A Progressive AI Framework

Kuaishou Tech

Jul 2, 2025 · Artificial Intelligence

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

AI researchEvolutionary SearchTest-Time Scaling

0 likes · 8 min read

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

Kuaishou Large Model

Jun 11, 2025 · Artificial Intelligence

12 Kuaishou Breakthrough Papers at CVPR 2025: Video Generation, Diffusion & Multimodal AI

CVPR 2025 in Nashville will feature 12 Kuaishou papers spanning large‑scale video datasets, quality assessment, 3D/4D reconstruction, controllable generation, diffusion scaling laws, multimodal simulation, and novel benchmarks, highlighting the company's cutting‑edge contributions to video AI research.

diffusion modelslarge-scale datasets

0 likes · 21 min read

12 Kuaishou Breakthrough Papers at CVPR 2025: Video Generation, Diffusion & Multimodal AI

DataFunTalk

Jun 8, 2025 · Artificial Intelligence

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

The article examines the current dominance of diffusion models in commercial video generation, contrasts them with autoregressive methods, and details how the open‑source MAGI‑1 model combines both paradigms to achieve longer, more controllable video synthesis while addressing scalability and quality challenges.

AI researchAutoregressive ModelsMAGI-1

0 likes · 70 min read

Why Autoregressive Video Models Like MAGI-1 May Outperform Diffusion Approaches

Amap Tech

Jun 5, 2025 · Artificial Intelligence

How MVPainter Achieves Accurate, High‑Detail 3D Texture Generation with Multi‑View Diffusion

MVPainter introduces a fully open‑source pipeline that generates high‑quality, PBR‑compatible 3D textures from a single reference image and a white model by leveraging multi‑view diffusion, geometric control, and a human‑aligned evaluation framework, dramatically improving texture fidelity, alignment, and detail.

3D texture generationAIPBR

0 likes · 10 min read

How MVPainter Achieves Accurate, High‑Detail 3D Texture Generation with Multi‑View Diffusion

AntTech

Jun 4, 2025 · Artificial Intelligence

LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions

This article presents the LLaDA series of diffusion‑based large language models, explains how their generative‑modeling principle yields language intelligence comparable to autoregressive models, and details the multimodal LLaDA‑V architecture, training methods, experimental results, and broader implications for AI research.

Generative Modelingdiffusion modelsinstruction following

0 likes · 10 min read

LLaDA and LLaDA‑V: Large Language Diffusion Models and Their Multimodal Extensions

AI Frontier Lectures

May 23, 2025 · Artificial Intelligence

How SuperEdit Boosts Instruction-Based Image Editing with Rectified Supervision

SuperEdit introduces rectified instruction generation and contrastive supervision to fix noisy supervision in instruction‑based image editing, achieving up to 9.19% performance gains on Real‑Edit benchmarks without extra model parameters or pre‑training, and releases all data and code publicly.

diffusion modelsimage editingvisual-language models

0 likes · 15 min read

How SuperEdit Boosts Instruction-Based Image Editing with Rectified Supervision

AI Frontier Lectures

May 19, 2025 · Artificial Intelligence

How SuperEdit Boosts Instruction-Based Image Editing with Rectified Supervision

SuperEdit introduces rectified instruction generation and contrastive supervision to fix noisy training signals in instruction‑based image editing, achieving up to 9.19% performance gains without extra parameters or pre‑training, as demonstrated on the Real‑Edit benchmark.

diffusion modelsimage editingsupervision

0 likes · 13 min read

AIWalker

May 16, 2025 · Artificial Intelligence

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

GPDiT, a novel autoregressive diffusion transformer, unifies diffusion and autoregressive modeling for video generation, introducing lightweight causal attention and a parameter‑free rotation‑based time conditioning that boost temporal consistency and cut training/inference costs, achieving state‑of‑the‑art results on multiple benchmarks.

Video Generationautoregressive modelingcausal attention

0 likes · 16 min read

GPDiT Sets New SOTA in Video Generation with Faster, Unified Diffusion‑Autoregressive Framework

AI Algorithm Path

May 15, 2025 · Artificial Intelligence

Understanding Diffusion Models: Core Principles Explained

This article explains the fundamental principles of diffusion models, using physics and machine‑learning analogies to describe forward and reverse diffusion, the role of Gaussian noise, iteration trade‑offs, U‑Net architecture, and shared‑weight training for image generation.

U-Netdiffusion modelsforward diffusion

0 likes · 8 min read

Understanding Diffusion Models: Core Principles Explained

AI Frontier Lectures

May 13, 2025 · Artificial Intelligence

How Diffusion Policy is Transforming Vision‑Based Robot Motion Learning

This article provides a comprehensive, step‑by‑step analysis of Diffusion Policy for robot visuomotor control, covering its motivation, task characteristics, model design, dataset preparation, training pipeline, inference procedure, experimental results, and open research questions.

Machine Learningdiffusion modelspolicy learning

0 likes · 63 min read

How Diffusion Policy is Transforming Vision‑Based Robot Motion Learning

AIWalker

May 13, 2025 · Artificial Intelligence

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

PixelHacker introduces a latent class guidance (LCG) paradigm that injects foreground and background embeddings into a diffusion model, training on 14 million image‑mask pairs and achieving state‑of‑the‑art structural and semantic consistency across Places2, CelebA‑HQ and FFHQ benchmarks.

PixelHackerSOTAcomputer vision

0 likes · 16 min read

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

AIWalker

May 11, 2025 · Artificial Intelligence

Unified Multimodal Understanding and Generation: A 30K‑Word Survey of Recent Advances

This comprehensive survey reviews the rapid progress of multimodal understanding and text‑to‑image generation models, categorises existing unified architectures into diffusion‑based, autoregressive, and hybrid paradigms, analyses their tokenisation strategies, datasets and benchmarks, and highlights current challenges and future research directions.

Autoregressive ModelsSurveybenchmarks

0 likes · 64 min read

Unified Multimodal Understanding and Generation: A 30K‑Word Survey of Recent Advances

AI Frontier Lectures

Apr 28, 2025 · Artificial Intelligence

How DP-Recon Uses Diffusion Models to Reconstruct 3D Scenes from Sparse Photos

DP-Recon leverages generative diffusion priors and a visibility‑guided SDS loss to achieve high‑fidelity, compositional 3D scene reconstruction from extremely sparse images, delivering superior geometry, texture, and text‑driven editing capabilities demonstrated on benchmark datasets and real‑world indoor scenarios.

3D ReconstructionAIdiffusion models

0 likes · 10 min read

How DP-Recon Uses Diffusion Models to Reconstruct 3D Scenes from Sparse Photos

AI Frontier Lectures

Apr 18, 2025 · Artificial Intelligence

DiffDenoise: Conditional Diffusion Transforms Medical Image Denoising

DiffDenoise introduces a three‑stage self‑supervised pipeline that combines a blind‑spot network, conditional diffusion modeling, and stabilized reverse diffusion sampling to dramatically improve medical image denoising performance on both synthetic and real datasets, while also offering a fast distilled version for practical deployment.

diffusion modelsimage processingmedical imaging

0 likes · 10 min read

DiffDenoise: Conditional Diffusion Transforms Medical Image Denoising

Alimama Tech

Apr 17, 2025 · Artificial Intelligence

PosterMaker: High-Quality Product Poster Generation with Accurate Text Rendering

PosterMaker leverages a ControlNet‑based TextRenderNet with character‑level visual features and a reward‑driven foreground‑extension detector to generate high‑quality product posters that accurately render Chinese text (over 90% sentence accuracy) while preserving product fidelity, and is already deployed in Alibaba’s AI creative tool.

E-commerce AIcharacter-level featuresdiffusion models

0 likes · 18 min read

PosterMaker: High-Quality Product Poster Generation with Accurate Text Rendering

AIWalker

Apr 16, 2025 · Artificial Intelligence

Fast and Precise: FloED Sets New State‑of‑the‑Art in Video Restoration Over All Diffusion Models

FloED introduces a dual‑branch, flow‑guided diffusion framework that dramatically improves spatio‑temporal consistency and computational efficiency for video restoration, outperforming existing text‑guided diffusion methods on both object removal and background repair benchmarks.

EfficiencyFloEDdiffusion models

0 likes · 16 min read

Fast and Precise: FloED Sets New State‑of‑the‑Art in Video Restoration Over All Diffusion Models

AIWalker

Apr 14, 2025 · Artificial Intelligence

Breaking the Binary: FlexIP Enables Both Identity Preservation and Personalized Editing

FlexIP introduces a dual‑adapter architecture and a dynamic weight‑gating mechanism that decouple identity preservation from personalized editing, allowing continuous control over image generation and outperforming prior SOTA methods in both fidelity and flexibility.

AIdiffusion modelsdual-adapter

0 likes · 16 min read

Breaking the Binary: FlexIP Enables Both Identity Preservation and Personalized Editing

AIWalker

Apr 10, 2025 · Artificial Intelligence

DCEdit: Precise Text-Guided Image Editing that Preserves Backgrounds

DCEdit introduces a precise semantic localization strategy and a dual-level control mechanism for text‑guided image editing, delivering superior background preservation and editing quality, as demonstrated on the new RW‑800 benchmark and extensive comparisons with state‑of‑the‑art diffusion models.

AIbenchmarkdiffusion models

0 likes · 16 min read

DCEdit: Precise Text-Guided Image Editing that Preserves Backgrounds

AIWalker

Apr 7, 2025 · Artificial Intelligence

TurboFill: High‑Quality Image Inpainting in Just 4 Steps

TurboFill introduces a fast image‑inpainting model that trains a repair adapter on a few‑step text‑to‑image diffusion backbone, achieving state‑of‑the‑art results with only four diffusion steps while dramatically reducing computational cost.

TurboFillcomputer visiondiffusion models

0 likes · 17 min read

TurboFill: High‑Quality Image Inpainting in Just 4 Steps

DaTaobao Tech

Apr 7, 2025 · Artificial Intelligence

Flow Matching for Generative Modeling

Flow Matching reformulates generative modeling by learning a time‑dependent vector field that deterministically transports Gaussian noise to data, using a neural network trained with an analytically derived L2 loss, yielding simpler training, faster convergence, and deterministic sampling that matches or exceeds diffusion model quality.

AIGenerative Modelingcontinuous normalizing flow

0 likes · 13 min read

AIWalker

Mar 27, 2025 · Artificial Intelligence

MagicColor: First Multi‑Instance AI Sketch‑Coloring System for Professional‑Grade Comics

MagicColor introduces a novel multi‑instance sketch‑coloring framework that uses a two‑stage self‑play training strategy, instance guidance, and edge‑aware pixel‑level color matching to automatically produce high‑quality, consistent colors for multiple line‑art instances, outperforming prior GAN and diffusion‑based methods.

AIMulti-InstanceSketch Colorization

0 likes · 16 min read

MagicColor: First Multi‑Instance AI Sketch‑Coloring System for Professional‑Grade Comics

Tencent Cloud Developer

Mar 25, 2025 · Artificial Intelligence

Knowledge Distillation in Diffusion Models: Techniques and Applications

The article explains how knowledge distillation transfers capabilities from large to smaller diffusion models, covering hard and soft labels, temperature scaling, and contrasting it with data distillation, while detailing techniques such as consistency models, progressive distillation, adversarial distillation, and adversarial post‑training for model compression and step reduction.

Knowledge DistillationModel Compressionadversarial post-training

0 likes · 19 min read

Knowledge Distillation in Diffusion Models: Techniques and Applications

AIWalker

Mar 23, 2025 · Artificial Intelligence

One-Click Removal & Seamless Integration: CycleFlow + Diffusion Prior Power OmniPaint

OmniPaint introduces a unified diffusion‑based framework that achieves physically consistent object removal and insertion by leveraging a pre‑trained FLUX‑1 diffusion prior, a progressive CycleFlow training pipeline, and a novel reference‑free CFD metric for high‑fidelity image editing.

CFD MetricCycleFlowObject Insertion

0 likes · 17 min read

One-Click Removal & Seamless Integration: CycleFlow + Diffusion Prior Power OmniPaint

AIWalker

Mar 18, 2025 · Artificial Intelligence

How ImageRAG Boosts Text‑to‑Image Generation with Retrieval‑Augmented Generation

ImageRAG introduces a retrieval‑augmented generation framework that dynamically fetches relevant images to guide diffusion models, dramatically improving the synthesis of rare and fine‑grained concepts across multiple text‑to‑image systems, as demonstrated by extensive quantitative and user studies.

AI GenerationImageRAGRetrieval-Augmented Generation

0 likes · 17 min read

How ImageRAG Boosts Text‑to‑Image Generation with Retrieval‑Augmented Generation

AI Frontier Lectures

Mar 17, 2025 · Artificial Intelligence

Can Diffusion Models Outrun Traditional LLMs? Mercury Coder’s Speed & Architecture

The article analyzes Mercury Coder, a diffusion‑based language model that generates text and code in parallel, compares its speed and quality against traditional autoregressive LLMs like GPT‑4o‑mini using a ball‑collision benchmark, and discusses the underlying score‑entropy training, current limitations, and future multimodal potential.

AI PerformanceMercuryText Generation

0 likes · 8 min read

Can Diffusion Models Outrun Traditional LLMs? Mercury Coder’s Speed & Architecture

AIWalker

Mar 16, 2025 · Artificial Intelligence

VideoPainter: Plug‑and‑Play Video Inpainting and Editing Sets 8 SOTA Benchmarks

VideoPainter introduces a plug‑and‑play dual‑branch framework for video inpainting and editing, featuring a lightweight context encoder, ID‑consistent resampling, and the large VPData/VPBench datasets, and achieves state‑of‑the‑art results across eight quantitative and qualitative metrics.

Dual-Branch ArchitectureID resamplingPlug-and-Play

0 likes · 15 min read

VideoPainter: Plug‑and‑Play Video Inpainting and Editing Sets 8 SOTA Benchmarks

AI Frontier Lectures

Mar 11, 2025 · Artificial Intelligence

How Stochastic Differential Equations Power Modern Generative AI Models

This article explains how recent MIT research uses stochastic differential equations to model diffusion and flow processes, defines training objectives, explores conditional guidance, compares U‑Net and diffusion transformers, addresses memory challenges with latent diffusion, and surveys applications ranging from robotics to protein design.

Latent Diffusiondiffusion modelsflow matching

0 likes · 26 min read

How Stochastic Differential Equations Power Modern Generative AI Models

AIWalker

Mar 8, 2025 · Artificial Intelligence

IMAGPose: A Unified Conditional Framework for Photo‑Realistic Pose‑Guided Person Generation (NeurIPS 2024)

IMAGPose introduces a unified conditional diffusion framework that combines feature‑level, image‑level, and cross‑view attention modules to generate high‑fidelity, photo‑realistic person images under diverse pose and multi‑view scenarios, outperforming prior SOTA methods on DeepFashion and Market‑1501.

AIcomputer visiondiffusion models

0 likes · 22 min read

IMAGPose: A Unified Conditional Framework for Photo‑Realistic Pose‑Guided Person Generation (NeurIPS 2024)

AIWalker

Mar 5, 2025 · Artificial Intelligence

Attention Distillation in Diffusion Models: CVPR 2025 Technique Outperforms Traditional Image Generation

The paper introduces a novel attention‑distillation loss and a guided‑sampling scheme that together enable diffusion models to faithfully transfer visual features from reference images, dramatically speeding synthesis and surpassing prior plug‑and‑play attention methods across style transfer, text‑to‑image generation, and texture synthesis tasks.

AI researchStyle Transferattention distillation

0 likes · 15 min read

Attention Distillation in Diffusion Models: CVPR 2025 Technique Outperforms Traditional Image Generation

AIWalker

Mar 3, 2025 · Artificial Intelligence

ByteDance’s Diffusion Restoration Adapter Achieves State‑of‑the‑Art Real‑World Image Recovery

This paper introduces a lightweight Diffusion Restoration Adapter that integrates into pre‑trained diffusion priors such as StableDiffusion XL and StableDiffusion 3, dramatically reduces parameter overhead compared with ControNet, and delivers superior quantitative and visual results on real‑world image restoration benchmarks through a novel sampling strategy.

AIAdapterStableDiffusion

0 likes · 17 min read

ByteDance’s Diffusion Restoration Adapter Achieves State‑of‑the‑Art Real‑World Image Recovery

JD Retail Technology

Feb 25, 2025 · Artificial Intelligence

How JD’s “JingDianDian” AI Platform Revolutionizes E‑commerce Content Creation

JD Retail’s self‑built AIGC platform ‘JingDianDian’ leverages multimodal diffusion models, ControlNet, RAG and reinforcement learning to automatically generate high‑quality product images, videos and marketing copy, cutting production time from days to seconds, slashing costs by over 99% for more than 350 k merchants.

AIGCcontent generationdiffusion models

0 likes · 15 min read

How JD’s “JingDianDian” AI Platform Revolutionizes E‑commerce Content Creation

AIWalker

Feb 22, 2025 · Artificial Intelligence

DC‑AE: A 128× Downsampling Autoencoder that Super‑Charges High‑Resolution Diffusion Models

DC‑AE introduces Residual Autoencoding and Decoupled High‑Resolution Adaptation to achieve up to 128× spatial compression in autoencoders, preserving reconstruction quality while delivering roughly 19× inference and 18× training speedups for high‑resolution diffusion models, as demonstrated on ImageNet and other benchmarks.

Autoencodercompressiondiffusion models

0 likes · 13 min read

DC‑AE: A 128× Downsampling Autoencoder that Super‑Charges High‑Resolution Diffusion Models